On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so 
> I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something 
> that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just 
> intermediate format, we should be as effective here as possible and we should 
> not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the 
> patch on why we are using an encoding such as JSON to represent arrays and 
> maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as 
> JSONstring, so when parsing it back to actual arays or arrays object, it is 
> no brainer and we use the JSON parse method, the code also says the CSV IDF 
> to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I 
> don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then 
> we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would 
> not over optimize for it. JSON serializers are pretty fast esp jackson which 
> we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern 
> for certain Data sources, changing this to any other type should be  
> strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and 
> pg_dump are serializing data to provide the best way in Sqoop 2. That is what 
> we've done when designing CSV IDF and we should keep doing that. The CSV IDF 
> is not internal as it's exposed as a standard for other connectors and we 
> should not change that in the future in backward incompatible way. Hence we 
> should be careful when introducing new syntax and rules and the due diligence 
> in investigating what is best.

I am still not sure what the concern is , how we store the IDF is and should be 
internal to sqoop. I will let Abe weigh in as well. The fact that is it called 
IDF means, sqoop is responsible to read from FROM..store it in its IDF and then 
when it passes data to the outputcommitter/ format ( in case of MR) it uses 
this and converts it back to the java objects. So If I modify this internally 
string to CSV to something else, it should not affect any conenctor develoepr.

if course if I change what an Array means then it will affect


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special 
> characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and 
> this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
>  39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java
>  5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java
>  4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   
> connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java
>  fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>

Reply via email to