[ https://issues.apache.org/jira/browse/CRUNCH-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682861#comment-15682861 ]
Tony Wu commented on CRUNCH-627: -------------------------------- Thx, John! > Shard API doesn't work well with parquet target > ----------------------------------------------- > > Key: CRUNCH-627 > URL: https://issues.apache.org/jira/browse/CRUNCH-627 > Project: Crunch > Issue Type: Bug > Components: MapReduce Patterns > Affects Versions: 0.13.0 > Environment: Linux X86 > Reporter: Tony Wu > Labels: patch > Fix For: 0.13.0 > > Attachments: CRUNCH-627.patch > > > PCollection<User> outTable = oldTable.union(newTable); > Shard.shard(outTable,10).write(new AvroParquetFileTarget(tempOut+path), > Target.WriteMode.OVERWRITE); > However, I have another job which would read the output of above target > output and use a field as the key , the job output looks like below > 3.0.3.1.2.CH24_RELEASE 2 > 3.0.3.1.2.CH24_RELEASEE 1 > 3.0.3.1.2.CH24_RELEASEEA 1 > 3.0.3.1.2.CH24_RELEASEEAS 1 > 3.0.3.1.2.CH24_RELEASEEASE 29 > 3.0.3.1.2.CH24_RELEASEEASES 160 > 3.0.3.1.2.CH24_RELEASEEASESE 85 > 3.0.3.1.2.CH24_RELEASEEASESEE 14 > 3.0.3.1.2.CH24_RELEASEEASESEEE 4 > 3.0.3.1.2.CH24_RELEASEEASESEEES 1 > there is extra suffix added to the key of the PTable, all of them > should be RELEASE but not the RELEASEEASE bra bra > If I remove the Shard, and keeps all the same, the output looks like normal > 3.0.0.1.2.CH.1.4_RELEASE 1 > 3.0.1.1.2.CH22_RELEASE 1622 > 3.0.1.1.2.CH23_RELEASE 10607 > 3.0.14.1.2.CH.1.3_RELEASE 18080 > 3.0.19.1.2.TC21_RELEASE 5 > 3.0.2.1.2.CH11_RELEASE 3 > 3.0.2.1.2.TC21_RELEASE 4 > 3.0.20.1.2.TC21_RELEASE 247 > 3.0.20.7.2.SX.1.2A_RELEASE 2 > 3.0.20.8.2.SX.1.3A_RELEASE 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)