[ 
https://issues.apache.org/jira/browse/HIVE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964552#comment-13964552
 ] 

Owen O'Malley commented on HIVE-6319:
-------------------------------------

In the interest of getting 0.13 finalized, I'm +1 except that:
* The temp directory for the compactor should have a unique number.

Other comments that we can fix later:
* AcidUtils.java
** DELTA_DIGITS, BUCKET_DIGITS - you don't need to make it public, just have 
CompactorTest.addFile not pad the filename.
** BUCKET_DIGIT_PATTERN, LEGACY_BUCKET_DIGIT_PATTERN - i think you can use 
AcidUtils.parseBaseBucketFilename to get the information you need instead of 
adding these.
** LEGACY_BUCKET_DIGIT_PATTERN - are you sure that is the right length?
* Cleaner.java
** run
*** can the unlock throw? If so, we can lose the original exception. I'd
    suggest replacing the finally with a try/catch with the unlock done in
    both branches.
* CompactorMR.java 
** run
*** i assume the sd is either the partition sd for partitioned tables or the 
table sd for non-partitioned ones. you should probably comment that.
*** do partition sds have the partition directory as their location?
*** move delta check before dealing with the bases
*** i would have expected the delta processing to happen in getinputsplits so 
that you wouldn't need to serialize as much and it wouldn't need to restat the 
files.
** CompactorInputFormat.getSplits
*** we should fix the raw reader so that you can just generate a job per a  
bucket and buckets that don't exist will just get a 0 row iterator.
*** you don't need to build the precise map of deltas for each bucket, the  
final version of the raw reader will ignore missing files.
** CompactorMap.getWriter
*** you should throw an exception if the min or max txn id isn't set.
** CompactorInputSplit
*** Could use Arrays.asList to build the List.
** StringableList.toString
*** if size() > 0 is redundant

> Insert, update, delete functionality needs a compactor
> ------------------------------------------------------
>
>                 Key: HIVE-6319
>                 URL: https://issues.apache.org/jira/browse/HIVE-6319
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: 0.13.0
>
>         Attachments: 6319.wip.patch, HIVE-6319.patch, HIVE-6319.patch, 
> HIVE-6319.patch, HIVE-6319.patch, HiveCompactorDesign.pdf
>
>
> In order to keep the number of delta files from spiraling out of control we 
> need a compactor to collect these delta files together, and eventually 
> rewrite the base file when the deltas get large enough.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to