Kabeer, I have never seen this issue before. 2019-07-02 22:43:31,875 [main] INFO com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => 9223372036854775807
The above average record size looks wrong. It is Long.MAX_VALUE. The logic for calculating average record size is in https://github.com/apache/incubator-hudi/blob/9f18a1ca80ec1d08253688d9b4d4538a86068559/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L742 >From the code, this could happen if no records were written in previously >committed instant (.commit). Can you check if you see empty commit lying >around inĀ .hoodie fodler. On Wednesday, July 3, 2019, 3:39:15 AM PDT, Kabeer Ahmed <[email protected]> wrote: Hi Nishith, Please find the latest commit data in the gist at: https://gist.github.com/smdahmed/5d811cb4833243a11ac09b9dc61e5b4d (https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F5d811cb4833243a11ac09b9dc61e5b4d&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) For your convenience, I have also copy pasted it below. Any help is highly appreciated. Thanks Kabeer. 20190702161629.commit: { "partitionToWriteStats" : { "2018/05/30" : [ { "fileId" : "39cff0df-24e4-45b8-bff5-9b4f41c4096a", "path" : "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet", "prevCommit" : "20190702161417", "numWrites" : 11614, "numDeletes" : 0, "numUpdateWrites" : 5, "numInserts" : 3, "totalWriteBytes" : 848480, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "2018/05/30", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0 } ], "2018/05/31" : [ { "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956", "path" : "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet", "prevCommit" : "null", "numWrites" : 10430, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 10430, "totalWriteBytes" : 820723, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : "2018/05/31", "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0 } ] }, "compacted" : false, "extraMetadataMap" : { "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : \"commit\"\n}" }, "extraMetadata" : { "ROLLING_STAT" : "{\n \"partitionToRollingStats\" : {\n \"2018/05/29\" : {\n \"235bd794-790b-48e7-b9ea-956149db1dce\" : {\n \"fileId\" : \"235bd794-790b-48e7-b9ea-956149db1dce\",\n \"inserts\" : 2,\n \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 443797\n }\n },\n \"2018/05/30\" : {\n \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\" : {\n \"fileId\" : \"39cff0df-24e4-45b8-bff5-9b4f41c4096a\",\n \"inserts\" : 23220,\n \"upserts\" : 5,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 848282\n }\n },\n \"2018/05/31\" : {\n \"4f5514e8-d57c-4c6e-be8f-c3448051c956\" : {\n \"fileId\" : \"4f5514e8-d57c-4c6e-be8f-c3448051c956\",\n \"inserts\" : 10430,\n \"upserts\" : 0,\n \"deletes\" : 0,\n \"totalInputWriteBytesToDisk\" : 0,\n \"totalInputWriteBytesOnDisk\" : 820723\n }\n }\n },\n \"actionType\" : \"commit\"\n}" }, "totalScanTime" : 0, "totalCreateTime" : 2439, "totalUpsertTime" : 2450, "totalCompactedRecordsUpdated" : 0, "totalLogFilesCompacted" : 0, "totalLogFilesSize" : 0, "fileIdAndRelativePaths" : { "4f5514e8-d57c-4c6e-be8f-c3448051c956" : "2018/05/31/4f5514e8-d57c-4c6e-be8f-c3448051c956_1_20190702161629.parquet", "39cff0df-24e4-45b8-bff5-9b4f41c4096a" : "2018/05/30/39cff0df-24e4-45b8-bff5-9b4f41c4096a_0_20190702161629.parquet" }, "totalRecordsDeleted" : 0, "totalLogRecordsCompacted" : 0 } 20190702161750.clean: Objavro.schema {"type":"record","name":"HoodieCleanMetadata","namespace":"com.uber.hoodie.avro.model","fields":[{"name":"startCleanTime","type":{"type":"string","avro.java.string":"String"}},{"name":"timeTakenInMillis","type":"long"},{"name":"totalFilesDeleted","type":"int"},{"name":"earliestCommitToRetain","type":{"type":"string","avro.java.string":"String"}},{"name":"partitionMetadata","type":{"type":"map","values":{"type":"record","name":"HoodieCleanPartitionMetadata","fields":[{"name":"partitionPath","type":{"type":"string","avro.java.string":"String"}},{"name":"policy","type":{"type":"string","avro.java.string":"String"}},{"name":"deletePathPatterns","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"successDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}},{"name":"failedDeleteFiles","type":{"type":"array","items":{"type":"string","avro.java.string":"String"}}}]},"avro.java.string":"String"}}]} 20190702161847.inflight: { "partitionToWriteStats" : { "2018/05/31" : [ { "fileId" : "4f5514e8-d57c-4c6e-be8f-c3448051c956", "path" : null, "prevCommit" : "20190702161629", "numWrites" : 0, "numDeletes" : 0, "numUpdateWrites" : 2, "numInserts" : 0, "totalWriteBytes" : 0, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : null, "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0 } ] }, "compacted" : false, "extraMetadataMap" : { }, "totalScanTime" : 0, "totalCreateTime" : 0, "totalUpsertTime" : 0, "totalCompactedRecordsUpdated" : 0, "totalLogFilesCompacted" : 0, "totalLogFilesSize" : 0, "extraMetadata" : { }, "fileIdAndRelativePaths" : { "4f5514e8-d57c-4c6e-be8f-c3448051c956" : null }, "totalRecordsDeleted" : 0, "totalLogRecordsCompacted" : 0 } 20190702162055.inflight: { "partitionToWriteStats" : { }, "compacted" : false, "extraMetadataMap" : { }, "totalRecordsDeleted" : 0, "totalLogRecordsCompacted" : 0, "totalScanTime" : 0, "totalCreateTime" : 0, "totalUpsertTime" : 0, "totalCompactedRecordsUpdated" : 0, "totalLogFilesCompacted" : 0, "totalLogFilesSize" : 0, "fileIdAndRelativePaths" : { }, "extraMetadata" : { } } On Jul 3 2019, at 8:30 am, Kabeer Ahmed <[email protected]> wrote: > Hi Nishith, > > Thank you for the quick respnose. I shall try to send the commit metadata at > the earliest. I hope the commit metadata you are looking for is the one > within .hoodie/ directory and not the ones that is archived. > And there are inflight and commit metadata. I am taking that you want to look > into the one inflight. Shall revert back with further details. > Thanks > Kabeer. > > On Jul 3 2019, at 2:19 am, nishith agarwal <[email protected]> wrote: > > Kabir, > > > > Could you share the content of your commit metadata ? You can list the > > timeline, find the latest commit in the timeline, perform a cat and paste > > the results (that you can share). > > > > Thanks, > > Nishith > > > > On Tue, Jul 2, 2019 at 4:53 PM Kabeer Ahmed <[email protected]> wrote: > > > Hi Vinoth and other HUDI Experts, > > > I am stuck while processing inserts into HUDI. The process picks up CSV > > > files and loads them into HUDI. The process seems to be stuck at: > > > https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/table/HoodieCopyOnWriteTable.java#L679 > > > Log is below: > > > > > > 2019-07-02 22:43:31,875 [main] INFO > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - AvgRecordSize => > > > 9223372036854775807 > > > 2019-07-02 22:43:31,969 [main] INFO > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - For partitionPath : > > > 2018/05/30 Small Files => [SmallFile {location=HoodieRecordLocation > > > {commitTime=20190702161750, fileId=39cff0df-24e4-45b8-bff5-9b4f41c4096a}, > > > sizeBytes=435362}] > > > 2019-07-02 22:43:31,969 [main] INFO > > > com.uber.hoodie.table.HoodieCopyOnWriteTable - After small file > > > assignment: > > > unassignedInserts => 8, totalInsertBuckets => 2147483647, recordsPerBucket > > > => 0 > > > Looking at the last line in the log: "unassignedInserts => 8, > > > totalInsertBuckets => 2147483647, recordsPerBucket => 0", this causes the > > > below code to loop for quite long causing heap issues. > > > > > > logger.info( > > > "After small file assignment: unassignedInserts => " + > > > totalUnassignedInserts > > > + ", totalInsertBuckets => " + insertBuckets + ", recordsPerBucket => " > > > + insertRecordsPerBucket); > > > for (int b = 0; b < insertBuckets; b++) { > > > bucketNumbers.add(totalBuckets); > > > recordsPerBucket.add(totalUnassignedInserts / insertBuckets); > > > BucketInfo bucketInfo = new BucketInfo(); > > > bucketInfo.bucketType = BucketType.INSERT; > > > bucketInfoMap.put(totalBuckets, bucketInfo); > > > totalBuckets++; > > > } > > > Has someone seen the issue? Do I need to file a bug or it is something to > > > do with my misconfiguration? > > > > > > Any help is highly appreciated. > > > Thanks > > > Kabeer. > > > > > >
