Hi Nishith, Thank you for the quick respnose. I shall try to send the commit metadata at the earliest. I hope the commit metadata you are looking for is the one within .hoodie/ directory and not the ones that is archived. And there are inflight and commit metadata. I am taking that you want to look into the one inflight. Shall revert back with further details. Thanks Kabeer.
On Jul 3 2019, at 2:19 am, nishith agarwal <[email protected]> wrote: > Kabir, > > Could you share the content of your commit metadata ? You can list the > timeline, find the latest commit in the timeline, perform a cat and paste > the results (that you can share). > > Thanks, > Nishith > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <[email protected]> wrote: > > Hi Vinoth and other HUDI Experts, > > I am stuck while processing inserts into HUDI. The process picks up CSV > > files and loads them into HUDI. The process seems to be stuck at: > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679 > > Log is below: > > > > 2019-07-02 22:43:31,875 [main] INFO > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => > > 9223372036854775807 > > 2019-07-02 22:43:31,969 [main] INFO > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath : > > 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation > > {commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a}, > > sizeBytes=435362}] > > 2019-07-02 22:43:31,969 [main] INFO > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file assignment: > > unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket > > => 0 > > Looking at the last line in the log: "unassignedInserts => 8, > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the > > below code to loop for quite long causing heap issues. > > > > logger.info( > > "After small file assignment: unassignedInserts => " + > > totalUnassignedInserts > > + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => " > > + insertRecordsPerBucket); > > for (int b = 0; b < insertBuckets; b++) { > > bucketNumbers.add(totalBuckets); > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets); > > BucketInfo bucketInfo = new BucketInfo(); > > bucketInfo.bucketType = BucketType.INSERT; > > bucketInfoMap.put(totalBuckets, bucketInfo); > > totalBuckets++; > > } > > Has someone seen the issue? Do I need to file a bug or it is something to > > do with my misconfiguration? > > > > Any help is highly appreciated. > > Thanks > > Kabeer. > >
