[
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117837#comment-16117837
]
anishek commented on HIVE-16896:
--------------------------------
Thanks [~sankarh] for the review. [~thejas]/[~daijy] please commit the patch.
> move replication load related work in semantic analysis phase to execution
> phase using a task
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
> Issue Type: Sub-task
> Reporter: anishek
> Assignee: anishek
> Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch,
> HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while
> loading data. Currently we load all the files in the bootstrap dump location
> as {{FileStatus[]}} and then iterate over it to load objects, we should
> rather move to
> {code}
> org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listFiles(Path
> f, boolean recursive)
> {code}
> which would internally batch and return values.
> additionally since we cant hand off partial tasks from analysis pahse =>
> execution phase, we are going to move the whole repl load functionality to
> execution phase so we can better control creation/execution of tasks (not
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to
> see if we want to specifically do a multi threaded load of bootstrap dump.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)