[jira] [Commented] (PHOENIX-2723) Make BulkLoad able to load several tables at once

Sergey Soldatov (JIRA) Tue, 15 Mar 2016 02:20:09 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194981#comment-15194981
 ]


Sergey Soldatov commented on PHOENIX-2723:
------------------------------------------

well, the logic is quite simple. If there are several input files and one table 
name - all those files will be loaded to this table. Otherwise the number of 
tables need to be equal number of inputs. The advantage is to avoid writing 
iterating scripts, reduce time of job creation and scheduling and theoretically 
make a better load for the cluster.

> Make BulkLoad able to load several tables at once
> -------------------------------------------------
>
>                 Key: PHOENIX-2723
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2723
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Sergey Soldatov
>            Assignee: Sergey Soldatov
>
> It comes that usually bulk load is required for more than one table and 
> usually it's done by running jobs one by one. The idea is to provide lists of 
> tables and corresponding input sources to the MR BulkLoad job. Syntax can be 
> something like :
> yarn ... CsvBulkLoadTool -t table1,table2,table3 --input input1,input2,input3
> Having map tableName => input during map phase we can determine to which 
> table the current split belongs to and produce necessary tableRowKeyPair. 
> Any thoughts, suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2723) Make BulkLoad able to load several tables at once

Reply via email to