jackwang2 commented on issue #764: Hoodie 0.4.7:  Error upserting bucketType 
UPDATE for partition #, No value present
URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-517089256
 
 
   @n3nash No, I didn't. The main logic is for just global deduplication, and
   code is pasted as below:
   
     df.dropDuplicates(recordKey)
       .write
       .format("com.uber.hoodie")
       .mode(SaveMode.Append)
       .option(HoodieWriteConfig.TABLE_NAME, tableName)
       .option(HoodieIndexConfig.INDEX_TYPE_PROP,
   HoodieIndex.IndexType.GLOBAL_BLOOM.name)
       .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, recordKey)
       .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,
   partitionCol)
       .option(DataSourceWriteOptions.OPERATION_OPT_KEY,
   DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
       .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, storageType)
       .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, preCombineCol)
       .option("hoodie.consistency.check.enabled", "true")
       .option("hoodie.parquet.small.file.limit", 1024 * 1024 * 128)
       .save(tgtFilePath)
   
   Thanks,
   Jack
   
   On Thu, Aug 1, 2019 at 9:01 AM n3nash <[email protected]> wrote:
   
   > It looks like the "Not an Avro data file" exception is thrown when there
   > is a 0 byte stream read into the datafilereader as can be seen here :
   > 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader.java#L55
   > and here :
   > 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/DataFileConstants.java#L29
   >
   > From the stack trace (by tracing the line numbers), it looks like the
   > CLEAN file is failing to be archived. I looked at the clean logic and we do
   > create clean files even when we don't have anything to clean but that does
   > not result in a 0 bytes file, it still has some valid avro data. I'm
   > wondering if this has anything to do with any sort of race condition
   > leading to archiving running when clean is a 0 sized file.
   >
   > @jackwang2 <https://github.com/jackwang2> How are you running the cleaner
   > and the archival process ? Are you explicitly doing anything there ?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/incubator-hudi/issues/764?email_source=notifications&email_token=AGJTUUM7FLWXHG52WO2DEUTQCIYW5A5CNFSM4H3Z6GB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3I7OAI#issuecomment-517076737>,
   > or mute the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AGJTUULNELHUUHGR5OAMMBDQCIYW5ANCNFSM4H3Z6GBQ>
   > .
   >
   
   
   -- 
   [image: vshapesaqua11553186012.gif] <https://vungle.com/>   *Jianbin Wang*
   Sr. Engineer II, Data
   +86 18633600964
   
   [image: in1552694272.png] <https://www.linkedin.com/company/vungle>    
[image:
   fb1552694203.png] <https://facebook.com/vungle>      [image:
   tw1552694330.png] <https://twitter.com/vungle>      [image:
   ig1552694392.png] <https://www.instagram.com/vungle>
   Units 3801, 3804, 38F, C Block, Beijing Yintai Center, Beijing, China
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to