Lefty Leverenz commented on HIVE-16896:

Doc note:  This adds *hive.repl.approx.max.load.tasks* to HiveConf.java, so it 
needs to be documented in the wiki.

* [Configuration Properties -- Replication | 

Added a TODOC3.0 label.

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> ---------------------------------------------------------------------------------------------
>                 Key: HIVE-16896
>                 URL: https://issues.apache.org/jira/browse/HIVE-16896
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: anishek
>            Assignee: anishek
>              Labels: TODOC3.0
>             Fix For: 3.0.0
>         Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, 
> HIVE-16896.3.patch
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>        listFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.

This message was sent by Atlassian JIRA

Reply via email to