[
https://issues.apache.org/jira/browse/SQOOP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13580839#comment-13580839
]
Hari Shreedharan commented on SQOOP-777:
----------------------------------------
Hi Jarcec,
I am sorry, I was not a part of the project at this time, so I don't have much
background on the discussion at the time. But I definitely do not agree that
text is a good intermediate format.
I am not sure why we should be comparing against mysqldump or pg_dump, and if
their performance is due to their format. Since we are primarily interested in
reading directly from the db (rather than the dumps), I don't really understand
why text would perform better than a binary format like Avro?
Also by using text, it becomes complex to encode field names and schemas (other
than by forcing a JSON like schema or having header like structures).
I might be wrong on multiple fronts here, but text is inherently expensive
anyway - so I don't see much benefit in that either.
> Sqoop2: Implement intermediate data format representation policy
> -----------------------------------------------------------------
>
> Key: SQOOP-777
> URL: https://issues.apache.org/jira/browse/SQOOP-777
> Project: Sqoop
> Issue Type: New Feature
> Affects Versions: 2.0.0
> Reporter: Jarek Jarcec Cecho
> Assignee: Hari Shreedharan
> Fix For: 2.0.0
>
>
> We should enforce our intermediate data format policy to enforce as currently
> each driver can do it differently and that might break things.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira