Thanks for finding the old issue. Looks like we replied around the same time
:) Yeah, makes sense. The fix would probably be finding the newest instant with
non-zero records written and then using it for average record calculation. Let
us know if you are interested in working on the fix.
Balaji.V
On Wednesday, July 3, 2019, 8:25:33 AM PDT, Kabeer Ahmed
<[email protected]> wrote:
Hi Nishith and All,
I think I figured out what was blocking the processing of files for me. The
code snippet that I had sent before is indeed the issue. I have raised a new
issue and added a few details at:
https://github.com/apache/incubator-hudi/issues/776
(https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F776&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
Can someone please have a look and advise what is going wrong?
Thank you,
Kabeer.
On Jul 3 2019, at 11:38 am, Kabeer Ahmed <[email protected]> wrote:
> Hi Nishith,
>
> Please find the latest commit data in the gist at:
> https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d
> (https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> For your convenience, I have also copy pasted it below. Any help is highly
> appreciated.
>
> Thanks
> Kabeer.
>
> 20190702161629.commit:
> {
> "partitionToWriteStats" : {
> "2018/05/30" : [ {
> "fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a",
> "path" :
> "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet",
> "prevCommit" : "20190702161417",
> "numWrites" : 11614,
> "numDeletes" : 0,
> "numUpdateWrites" : 5,
> "numInserts" : 3,
> "totalWriteBytes" : 848480,
> "totalWriteErrors" : 0,
> "tempPath" : null,
> "partitionPath" : "2018/05/30",
> "totalLogRecords" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogSizeCompacted" : 0,
> "totalUpdatedRecordsCompacted" : 0,
> "totalLogBlocks" : 0,
> "totalCorruptLogBlock" : 0,
> "totalRollbackBlocks" : 0
> } ],
> "2018/05/31" : [ {
> "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> "path" :
> "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> "prevCommit" : "null",
> "numWrites" : 10430,
> "numDeletes" : 0,
> "numUpdateWrites" : 0,
> "numInserts" : 10430,
> "totalWriteBytes" : 820723,
> "totalWriteErrors" : 0,
> "tempPath" : null,
> "partitionPath" : "2018/05/31",
> "totalLogRecords" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogSizeCompacted" : 0,
> "totalUpdatedRecordsCompacted" : 0,
> "totalLogBlocks" : 0,
> "totalCorruptLogBlock" : 0,
> "totalRollbackBlocks" : 0
> } ]
> },
> "compacted" : false,
> "extraMetadataMap" : {
> "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n
> \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" :
> \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" :
> 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" :
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n
> \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" :
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n
> \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" :
> \"commit\"\n}"
> },
> "extraMetadata" : {
> "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n
> \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" :
> \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" :
> 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" :
> \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n
> \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" :
> \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n
> \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n
> \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" :
> \"commit\"\n}"
> },
> "totalScanTime" : 0,
> "totalCreateTime" : 2439,
> "totalUpsertTime" : 2450,
> "totalCompactedRecordsUpdated" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogFilesSize" : 0,
> "fileIdAndRelativePaths" : {
> "4f5514e8-d57c-4c6e-be8f-c3448051c956" :
> "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet",
> "39cff0df-24e4-45b8-bff5-9b4f41c4096a" :
> "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet"
> },
> "totalRecordsDeleted" : 0,
> "totalLogRecordsCompacted" : 0
> }
>
> 20190702161750.clean:
> Objavro.schema
> {"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]}
>
> 20190702161847.inflight:
> {
> "partitionToWriteStats" : {
> "2018/05/31" : [ {
> "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956",
> "path" : null,
> "prevCommit" : "20190702161629",
> "numWrites" : 0,
> "numDeletes" : 0,
> "numUpdateWrites" : 2,
> "numInserts" : 0,
> "totalWriteBytes" : 0,
> "totalWriteErrors" : 0,
> "tempPath" : null,
> "partitionPath" : null,
> "totalLogRecords" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogSizeCompacted" : 0,
> "totalUpdatedRecordsCompacted" : 0,
> "totalLogBlocks" : 0,
> "totalCorruptLogBlock" : 0,
> "totalRollbackBlocks" : 0
> } ]
> },
> "compacted" : false,
> "extraMetadataMap" : { },
> "totalScanTime" : 0,
> "totalCreateTime" : 0,
> "totalUpsertTime" : 0,
> "totalCompactedRecordsUpdated" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogFilesSize" : 0,
> "extraMetadata" : { },
> "fileIdAndRelativePaths" : {
> "4f5514e8-d57c-4c6e-be8f-c3448051c956" : null
> },
> "totalRecordsDeleted" : 0,
> "totalLogRecordsCompacted" : 0
> }
>
> 20190702162055.inflight:
>
> {
> "partitionToWriteStats" : { },
> "compacted" : false,
> "extraMetadataMap" : { },
> "totalRecordsDeleted" : 0,
> "totalLogRecordsCompacted" : 0,
> "totalScanTime" : 0,
> "totalCreateTime" : 0,
> "totalUpsertTime" : 0,
> "totalCompactedRecordsUpdated" : 0,
> "totalLogFilesCompacted" : 0,
> "totalLogFilesSize" : 0,
> "fileIdAndRelativePaths" : { },
> "extraMetadata" : { }
> }
>
> On Jul 3 2019, at 8:30 am, Kabeer Ahmed <[email protected]> wrote:
> > Hi Nishith,
> >
> > Thank you for the quick respnose. I shall try to send the commit metadata
> > at the earliest. I hope the commit metadata you are looking for is the one
> > within .hoodie/ directory and not the ones that is archived.
> > And there are inflight and commit metadata. I am taking that you want to
> > look into the one inflight. Shall revert back with further details.
> > Thanks
> > Kabeer.
> >
> > On Jul 3 2019, at 2:19 am, nishith agarwal <[email protected]> wrote:
> > > Kabir,
> > >
> > > Could you share the content of your commit metadata ? You can list the
> > > timeline, find the latest commit in the timeline, perform a cat and paste
> > > the results (that you can share).
> > >
> > > Thanks,
> > > Nishith
> > >
> > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <[email protected]> wrote:
> > > > Hi Vinoth and other HUDI Experts,
> > > > I am stuck while processing inserts into HUDI. The process picks up CSV
> > > > files and loads them into HUDI. The process seems to be stuck at:
> > > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679
> > > > Log is below:
> > > >
> > > > 2019-07-02 22:43:31,875 [main] INFO
> > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize =>
> > > > 9223372036854775807
> > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath :
> > > > 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation
> > > > {commitTime=20190702161750,
> > > > fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a},
> > > > sizeBytes=435362}]
> > > > 2019-07-02 22:43:31,969 [main] INFO
> > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file
> > > > assignment:
> > > > unassignedInserts => 8, totalInsertBuckets => 2147483647,
> > > > recordsPerBucket
> > > > => 0
> > > > Looking at the last line in the log: "unassignedInserts => 8,
> > > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes
> > > > the
> > > > below code to loop for quite long causing heap issues.
> > > >
> > > > logger.info(
> > > > "After small file assignment: unassignedInserts => " +
> > > > totalUnassignedInserts
> > > > + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => "
> > > > + insertRecordsPerBucket);
> > > > for (int b = 0; b < insertBuckets; b++) {
> > > > bucketNumbers.add(totalBuckets);
> > > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets);
> > > > BucketInfo bucketInfo = new BucketInfo();
> > > > bucketInfo.bucketType = BucketType.INSERT;
> > > > bucketInfoMap.put(totalBuckets, bucketInfo);
> > > > totalBuckets++;
> > > > }
> > > > Has someone seen the issue? Do I need to file a bug or it is something
> > > > to
> > > > do with my misconfiguration?
> > > >
> > > > Any help is highly appreciated.
> > > > Thanks
> > > > Kabeer.
> > >
> > >
> >
> >
>
>