anishek created HIVE-16896: ------------------------------ Summary: move replication load related work in semantic analysis phase to execution phase using a task Key: HIVE-16896 URL: https://issues.apache.org/jira/browse/HIVE-16896 Project: Hive Issue Type: Improvement Reporter: anishek Assignee: anishek
we want to not create too many tasks in memory in the analysis phase while loading data. Currently we load all the files in the bootstrap dump location as {{FileStatus[]}} and then iterate over it to load objects, we should rather move to {code} org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus> listFiles(Path f, boolean recursive) {code} which would internally batch and return values. additionally since we cant hand off partial tasks from analysis pahse => execution phase, we are going to move the whole repl load functionality to execution phase so we can better control creation/execution of tasks (not related to hive {{Task}}, we may get rid of ReplCopyTask) Additional consideration to take into account at the end of this jira is to see if we want to specifically do a multi threaded load of bootstrap dump. -- This message was sent by Atlassian JIRA (v6.4.14#64029)