[ https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964552#comment-13964552 ]
Owen O'Malley commented on HIVE-6319: ------------------------------------- In the interest of getting 0.13 finalized, I'm +1 except that: * The temp directory for the compactor should have a unique number. Other comments that we can fix later: * AcidUtils.java ** DELTA_DIGITS, BUCKET_DIGITS - you don't need to make it public, just have CompactorTest.addFile not pad the filename. ** BUCKET_DIGIT_PATTERN, LEGACY_BUCKET_DIGIT_PATTERN - i think you can use AcidUtils.parseBaseBucketFilename to get the information you need instead of adding these. ** LEGACY_BUCKET_DIGIT_PATTERN - are you sure that is the right length? * Cleaner.java ** run *** can the unlock throw? If so, we can lose the original exception. I'd suggest replacing the finally with a try/catch with the unlock done in both branches. * CompactorMR.java ** run *** i assume the sd is either the partition sd for partitioned tables or the table sd for non-partitioned ones. you should probably comment that. *** do partition sds have the partition directory as their location? *** move delta check before dealing with the bases *** i would have expected the delta processing to happen in getinputsplits so that you wouldn't need to serialize as much and it wouldn't need to restat the files. ** CompactorInputFormat.getSplits *** we should fix the raw reader so that you can just generate a job per a bucket and buckets that don't exist will just get a 0 row iterator. *** you don't need to build the precise map of deltas for each bucket, the final version of the raw reader will ignore missing files. ** CompactorMap.getWriter *** you should throw an exception if the min or max txn id isn't set. ** CompactorInputSplit *** Could use Arrays.asList to build the List. ** StringableList.toString *** if size() > 0 is redundant > Insert, update, delete functionality needs a compactor > ------------------------------------------------------ > > Key: HIVE-6319 > URL: https://issues.apache.org/jira/browse/HIVE-6319 > Project: Hive > Issue Type: Sub-task > Reporter: Alan Gates > Assignee: Alan Gates > Fix For: 0.13.0 > > Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, > HIVE-6319.patch, HIVE-6319.patch, HiveCompactorDesign.pdf > > > In order to keep the number of delta files from spiraling out of control we > need a compactor to collect these delta files together, and eventually > rewrite the base file when the deltas get large enough. -- This message was sent by Atlassian JIRA (v6.2#6252)