[
https://issues.apache.org/jira/browse/SQOOP-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258985#comment-14258985
]
Qian Xu commented on SQOOP-1602:
--------------------------------
I realized that the jira is invalid. Here is the whole story: If I do not
specify the number of extractors, it will create 10 map by default.
Consequentially it will create 10 loader instances. According to my
implementation, it will create 10 temporary dataset. Actually their sizes are
equal. But there is still a merge operation at destroy stage. It will merge 10
datasets as one. The number and size of file segments are controlled by Kite
internally. So it might remain only 3 file segments with different size.
Very sorry!
> Sqoop2: Fix the current balancing across Loaders internal to Sqoop
> --------------------------------------------------------------------
>
> Key: SQOOP-1602
> URL: https://issues.apache.org/jira/browse/SQOOP-1602
> Project: Sqoop
> Issue Type: Improvement
> Reporter: Veena Basavaraj
> Assignee: Qian Xu
> Fix For: 1.99.5
>
>
> The balancing of the record to the loaders in done internally in SQOOP today
> While writing the Kite Connector Qian noticed that this is not done fairly.
> While I am testing kite connector, I allocated 2 loaders. I thought data will
> be divided by 50% and 50% to both loaders. But actually the second loader
> does nothing, because its DataReader does not have any data to provide. Is it
> by design?
> >> About loaders do not have data in a balanced way.
> My scenario is 4 "jdbc_mysql" extractors to extract 100k row data (10MB).
> There are 2 Kite loaders to read data.
> This must be a bug that needs to be fixed in SQOOP ( [~abec] confirmed it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)