Hi Balaji,

My confidence isnt great when it comes to edit the code to find the newest non 
zero instant. So I would earnestly request someone who has worked on this 
before to grab a look. It might be efficient for someone knowledgeable around 
this code to add a fix rather than someone like me. (I would honestly like to 
work on the fix if someone is willing to do hand holding :) ).
Thanks,
On Jul 3 2019, at 4:32 pm, Balaji Varadarajan <[email protected]> 
wrote:
> Thanks for finding the old issue. Looks like we replied around the same time 
> :) Yeah, makes sense. The fix would probably be finding the newest instant 
> with non-zero records written and then using it for average record 
> calculation. Let us know if you are interested in working on the fix.
> Balaji.V
> On Wednesday, July 3, 2019, 8:25:33 AM PDT, Kabeer Ahmed 
> <[email protected]> wrote:
>
> Hi Nishith and All,
> I think I figured out what was blocking the processing of files for me. The 
> code snippet that I had sent before is indeed the issue. I have raised a new 
> issue and added a few details at: 
> https://github.com/apache/incubator-hudi/issues/776 
> (https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F776&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> Can someone please have a look and advise what is going wrong?
>
> Thank you,
> Kabeer.
>
> On Jul 3 2019, at 11:38 am, Kabeer Ahmed <[email protected]> wrote:
> > Hi Nishith,
> >
> > Please find the latest commit data in the gist at: 
> > https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d 
> > (https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> > For your convenience, I have also copy pasted it below. Any help is highly 
> > appreciated.
> >
> > Thanks
> > Kabeer.
> >
> > 20190702161629.commit:
> > {
> > "partitionToWriteStats" : {
> > "2018/05/30" : [ {
> > "fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a",
> > "path" : 
> > "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet",
> > "prevCommit" : "20190702161417",
> > "numWrites" : 11614,
> > "numDeletes" : 0,
> > "numUpdateWrites" : 5,
> > "numInserts" : 3,
> > "totalWriteBytes" : 848480,
> > "totalWriteErrors" : 0,
> > "tempPath" : null,
> > "partitionPath" : "2018/05/30",
> > "totalLogRecords" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogSizeCompacted" : 0,
> > "totalUpdatedRecordsCompacted" : 0,
> > "totalLogBlocks" : 0,
> > "totalCorruptLogBlock" : 0,
> > "totalRollbackBlocks" : 0
> > } ],
> > "2018/05/31" : [ {
> > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> > "path" : 
> > "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> > "prevCommit" : "null",
> > "numWrites" : 10430,
> > "numDeletes" : 0,
> > "numUpdateWrites" : 0,
> > "numInserts" : 10430,
> > "totalWriteBytes" : 820723,
> > "totalWriteErrors" : 0,
> > "tempPath" : null,
> > "partitionPath" : "2018/05/31",
> > "totalLogRecords" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogSizeCompacted" : 0,
> > "totalUpdatedRecordsCompacted" : 0,
> > "totalLogBlocks" : 0,
> > "totalCorruptLogBlock" : 0,
> > "totalRollbackBlocks" : 0
> > } ]
> > },
> > "compacted" : false,
> > "extraMetadataMap" : {
> > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : 
> > {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : 
> > \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" 
> > : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n 
> > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : 
> > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n 
> > \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n 
> > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : 
> > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n 
> > \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : 
> > \"commit\"\n}"
> > },
> > "extraMetadata" : {
> > "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : 
> > {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : 
> > \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" 
> > : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n 
> > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : 
> > \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n 
> > \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n 
> > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : 
> > \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n 
> > \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n 
> > \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : 
> > \"commit\"\n}"
> > },
> > "totalScanTime" : 0,
> > "totalCreateTime" : 2439,
> > "totalUpsertTime" : 2450,
> > "totalCompactedRecordsUpdated" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogFilesSize" : 0,
> > "fileIdAndRelativePaths" : {
> > "4f5514e8-d57c-4c6e-be8f-c3448051c956" : 
> > "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> > "39cff0df-24e4-45b8-bff5-9b4f41c4096a" : 
> > "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet"
> > },
> > "totalRecordsDeleted" : 0,
> > "totalLogRecordsCompacted" : 0
> > }
> >
> > 20190702161750.clean:
> > Objavro.schema 
> > {"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}
> >
> > 20190702161847.inflight:
> > {
> > "partitionToWriteStats" : {
> > "2018/05/31" : [ {
> > "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> > "path" : null,
> > "prevCommit" : "20190702161629",
> > "numWrites" : 0,
> > "numDeletes" : 0,
> > "numUpdateWrites" : 2,
> > "numInserts" : 0,
> > "totalWriteBytes" : 0,
> > "totalWriteErrors" : 0,
> > "tempPath" : null,
> > "partitionPath" : null,
> > "totalLogRecords" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogSizeCompacted" : 0,
> > "totalUpdatedRecordsCompacted" : 0,
> > "totalLogBlocks" : 0,
> > "totalCorruptLogBlock" : 0,
> > "totalRollbackBlocks" : 0
> > } ]
> > },
> > "compacted" : false,
> > "extraMetadataMap" : { },
> > "totalScanTime" : 0,
> > "totalCreateTime" : 0,
> > "totalUpsertTime" : 0,
> > "totalCompactedRecordsUpdated" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogFilesSize" : 0,
> > "extraMetadata" : { },
> > "fileIdAndRelativePaths" : {
> > "4f5514e8-d57c-4c6e-be8f-c3448051c956" : null
> > },
> > "totalRecordsDeleted" : 0,
> > "totalLogRecordsCompacted" : 0
> > }
> >
> > 20190702162055.inflight:
> > {
> > "partitionToWriteStats" : { },
> > "compacted" : false,
> > "extraMetadataMap" : { },
> > "totalRecordsDeleted" : 0,
> > "totalLogRecordsCompacted" : 0,
> > "totalScanTime" : 0,
> > "totalCreateTime" : 0,
> > "totalUpsertTime" : 0,
> > "totalCompactedRecordsUpdated" : 0,
> > "totalLogFilesCompacted" : 0,
> > "totalLogFilesSize" : 0,
> > "fileIdAndRelativePaths" : { },
> > "extraMetadata" : { }
> > }
> >
> > On Jul 3 2019, at 8:30 am, Kabeer Ahmed <[email protected]> wrote:
> > > Hi Nishith,
> > >
> > > Thank you for the quick respnose. I shall try to send the commit metadata 
> > > at the earliest. I hope the commit metadata you are looking for is the 
> > > one within .hoodie/ directory and not the ones that is archived.
> > > And there are inflight and commit metadata. I am taking that you want to 
> > > look into the one inflight. Shall revert back with further details.
> > > Thanks
> > > Kabeer.
> > >
> > > On Jul 3 2019, at 2:19 am, nishith agarwal <[email protected]> wrote:
> > > > Kabir,
> > > >
> > > > Could you share the content of your commit metadata ? You can list the
> > > > timeline, find the latest commit in the timeline, perform a cat and 
> > > > paste
> > > > the results (that you can share).
> > > >
> > > > Thanks,
> > > > Nishith
> > > >
> > > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <[email protected]> 
> > > > wrote:
> > > > > Hi Vinoth and other HUDI Experts,
> > > > > I am stuck while processing inserts into HUDI. The process picks up 
> > > > > CSV
> > > > > files and loads them into HUDI. The process seems to be stuck at:
> > > > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
> > > > > Log is below:
> > > > >
> > > > > 2019-07-02 22:43:31,875 [main] INFO
> > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize =>
> > > > > 9223372036854775807
> > > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath :
> > > > > 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation
> > > > > {commitTime=20190702161750, 
> > > > > fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a},
> > > > > sizeBytes=435362}]
> > > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file 
> > > > > assignment:
> > > > > unassignedInserts => 8, totalInsertBuckets => 2147483647, 
> > > > > recordsPerBucket
> > > > > => 0
> > > > > Looking at the last line in the log: "unassignedInserts => 8,
> > > > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes 
> > > > > the
> > > > > below code to loop for quite long causing heap issues.
> > > > >
> > > > > logger.info(
> > > > > "After small file assignment: unassignedInserts => " +
> > > > > totalUnassignedInserts
> > > > > + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => 
> > > > > "
> > > > > + insertRecordsPerBucket);
> > > > > for (int b = 0; b < insertBuckets; b++) {
> > > > > bucketNumbers.add(totalBuckets);
> > > > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
> > > > > BucketInfo bucketInfo = new BucketInfo();
> > > > > bucketInfo.bucketType = BucketType.INSERT;
> > > > > bucketInfoMap.put(totalBuckets, bucketInfo);
> > > > > totalBuckets++;
> > > > > }
> > > > > Has someone seen the issue? Do I need to file a bug or it is 
> > > > > something to
> > > > > do with my misconfiguration?
> > > > >
> > > > > Any help is highly appreciated.
> > > > > Thanks
> > > > > Kabeer.
> > > >
> > >
> >
>
>

Reply via email to