[
https://issues.apache.org/jira/browse/HIVE-21771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mahesh kumar behera reassigned HIVE-21771:
------------------------------------------
> Support partition filter (where clause) in REPL dump command
> ------------------------------------------------------------
>
> Key: HIVE-21771
> URL: https://issues.apache.org/jira/browse/HIVE-21771
> Project: Hive
> Issue Type: Sub-task
> Components: HiveServer2, repl
> Affects Versions: 4.0.0
> Reporter: mahesh kumar behera
> Assignee: mahesh kumar behera
> Priority: Major
> Fix For: 4.0.0
>
>
> *Bootstrap for managed table*
> User should be allowed to execute REPL DUMP with where clause. The where
> clause should support filtering out partition from dump. Format of the where
> clause should be similar to *"REPL DUMP dbname from 10 where t0 where key <
> 10, t1* where key = 3, [t2*,t3] where key > 3".* For initial version, very
> basic filter condition will be supported and later the complexity will be
> increased as and when required.
> * From the AST generated for the where clause, extract the table information.
> * Generate AST for each table.
> * List the partition for each table using the AST generated for each table
> using the same metastore API used by select query.
> * During bootstrap load use the partition list to dump the partitions.
> * During incremental dump, use the list to filter out the event.
> In case of bootstrap load, all the tables of the database will be scanned and
> * If table is not partitioned, then it will be dumped.
> * If key provided in the filter condition for the table is not a partition
> column, then dump will fail.
> * If table is not mentioned in the where clause, then all partitions of the
> table will be dumped.
> * All the partitioned of the table satisfying the where clause will be
> dumped.
> *Incremental for managed table*
> In case of Incremental Dump, the events from the notification log will be
> scanned and once the partition spec is extracted from the event, the
> partition spec will be filtered against the condition.
> * If table is not partitioned then the event will be added to the dump.
> * If key mentioned is not a partition column, then dump will fail.
> * If the table is not mentioned in the filter then event will be added to
> the dump.
> * If the event is multi partitioned, then the event will be added to the
> dump. (Filtering out redundant partitions from message will be done as part
> of separate task).
> * If the partition spec matches the filter, then the event will be added to
> the dump*.*
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)