[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429094#comment-16429094
 ] 

Sergey Shelukhin edited comment on HIVE-19124 at 4/6/18 10:40 PM:
------------------------------------------------------------------

The watermark issue for compactor specifically could probably be addressed by
1) Modifying txn list in recordValidWriteIds in Driver based on the flag. We 
create the driver so the flag can be set directly, no shennanigans necessary. 
Compactor write IDs can be created and serialized for the query to only read 
the data we want.
2) When renaming the directory, generating the final name in commit method, 
AFTER the query, based on write IDs that the driver actually used.

That way we don't even need any UDFs or INPUT_FILE_NAME stuff and it will work 
just like that.
I'm not sure I'll have enough time to finish this today and I'm out next week, 
but I'll attach a WIP patch. 

For insert overwrite outside of compaction this won't work because we do need 
to overwrite deltas above watermark that have already committed but not the 
ones in progress, so base would need to be discontinuous. But for compaction we 
don't need that.


was (Author: sershe):
The watermark issue for compactor specifically could probably be addressed by
1) Modifying txn list in recordValidWriteIds in Driver based on the flag. We 
create the driver so the flag can be set directly, no shennanigans necessary. 
Compactor write IDs can be created and serialized for the query to only read 
the data we want.
2) When renaming the directory, generating the final name in commit method, 
AFTER the query, based on write IDs that the driver actually used.

That way we don't even need any UDFs or INPUT_FILE_NAME stuff and it will work 
just like that.
I'm not sure I'll have enough time to finish this today and I'm out next week, 
but I'll attach a WIP patch. 

> implement a basic major compactor for MM tables
> -----------------------------------------------
>
>                 Key: HIVE-19124
>                 URL: https://issues.apache.org/jira/browse/HIVE-19124
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>              Labels: mm-gap-2
>         Attachments: HIVE-19124.01.patch, HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to