[jira] [Assigned] (HIVE-21771) Support partition filter (where clause) in REPL dump command

mahesh kumar behera (JIRA) Tue, 21 May 2019 20:35:29 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


mahesh kumar behera reassigned HIVE-21771:
------------------------------------------


> Support partition filter (where clause) in REPL dump command
> ------------------------------------------------------------
>
>                 Key: HIVE-21771
>                 URL: https://issues.apache.org/jira/browse/HIVE-21771
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>             Fix For: 4.0.0
>
>
> *Bootstrap for managed table*
> User should be allowed to execute REPL DUMP with where clause. The where 
> clause should support filtering out partition from dump. Format of the where 
> clause should be similar to *"REPL DUMP dbname from 10 where t0 where key < 
> 10, t1* where key = 3, [t2*,t3] where key > 3".* For initial version, very 
> basic filter condition will be supported and later the complexity will be 
> increased as and when required.
>  * From the AST generated for the where clause, extract the table information.
>  * Generate AST for each table.
>  * List the partition for each table using the AST generated for each table 
> using the   same metastore API used by select query.
>  * During bootstrap load use the partition list to dump the partitions.
>  * During incremental dump, use the list to filter out the event.
> In case of bootstrap load, all the tables of the database will be scanned and
>  * If table is not partitioned, then it will be dumped.
>  * If key provided in the filter condition for the table is not a partition 
> column, then dump will fail.
>  * If table is not mentioned in the where clause, then all partitions of the 
> table will be dumped.
>  * All the partitioned of the table satisfying the where clause will be 
> dumped.
> *Incremental for managed table*
> In case of Incremental Dump, the events from the notification log will be 
> scanned and once the partition spec is extracted from the event, the 
> partition spec will be filtered against the condition. 
>  * If table is not partitioned then the event will be added to the dump.
>  * If key mentioned is not a partition column, then dump will fail.
>  * If the table is not mentioned in the filter then event will be added to 
> the dump.
>  * If the event is multi partitioned, then the event will be added to the 
> dump. (Filtering out redundant partitions from message will be done as part 
> of separate task).
>  * If the partition spec matches the filter, then the event will be added to 
> the dump*.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21771) Support partition filter (where clause) in REPL dump command

Reply via email to