> On May 31, 2013, 8:30 p.m., Abraham Elmahrek wrote:
> > Seems good except for a few comments and nit picks.
> > 
> > The datetime split logic is when certain intervals are "hotter" than 
> > others. IE: 1000000 rows out of 20000000 exist between the date range of 
> > december 1st to december 31st, but a user is importing the entire years 
> > data, with 12 node. Basically, 1 machine will extract 1000000 rows, while 
> > the others will extract 1000000/11 rows. In the future we could probably 
> > add some upfront analysis of the data to improve distribution.
> 
> Abraham Elmahrek wrote:
>     Sorry... my wording above wasn't very concise... The datetime split logic 
> doesn't seem to handle "hot" partitions is what I meant.

Good point Abraham - even though this skew can happen to all datatypes that are 
used for splitting, with dates, there is more chance of such skew.  Typically 
the solution for this as also to efficiently manage the data life-cycle is 
dependent on the database facilities, so we may have to do database specific 
functionality like partitioning etc.   A generic solution can be attempted, but 
support for grouping and ordering based on expressions varies between different 
databases.   But something to keep in our mind


> On May 31, 2013, 8:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java,
> >  lines 164-180
> > <https://reviews.apache.org/r/11537/diff/2/?file=299088#file299088line164>
> >
> >     Nit: This seems like it can be merged above.

I am assuming you are talking about the end condition.   Please see above


> On May 31, 2013, 8:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java,
> >  lines 139-163
> > <https://reviews.apache.org/r/11537/diff/2/?file=299088#file299088line139>
> >
> >     Nit: This loop can probably handle all partitions by using 
> > constructDateConditions(objLB, objUB, false) to 
> > constructDateConditions(objLB, objUB, upperBound == maxDateValue). Also, 
> > upperBound += (i < remainder) ? 1 : 0;

That is true, but we have the pattern to add the last partition separately for 
all partitioning types.   It is better to be consistent, I thought.   Do you 
agree?


> On May 31, 2013, 8:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java,
> >  line 190
> > <https://reviews.apache.org/r/11537/diff/2/?file=299088#file299088line190>
> >
> >     There doesn't seem to be any thing stopping partitionMaxValue from 
> > being null. So, it makes sense to add some null checking here as well.

min and max being null is handled for all datatypes.   NULL in the presence of 
all values will be treated as having a lower value and so it is impossible to 
get MAX to null while MIN is not null.

In fact, when one row has a non-null value for a column with all the other rows 
in the table having the split by column as NULL will return the single non-null 
value for both min and max.

So, the whole null handling thing has to be cleaned up.   I will file a follow 
on JIRA for that


> On May 31, 2013, 8:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java,
> >  line 232
> > <https://reviews.apache.org/r/11537/diff/2/?file=299088#file299088line232>
> >
> >     There doesn't seem to be any thing stopping partitionMaxValue from 
> > being null. So, it makes sense to add some null checking here as well.

Please see above


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11537/#review21237
-----------------------------------------------------------


On May 30, 2013, 6:25 p.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11537/
> -----------------------------------------------------------
> 
> (Updated May 30, 2013, 6:25 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> This addresses Boolean, date, time, and timestamp splitters.
> 
> THis also disallows char type splitters as discussed in SQOOP-976
> 
> 
> Diffs
> -----
> 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
>  f80f30d 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportPartitioner.java
>  ee314d0 
> 
> Diff: https://reviews.apache.org/r/11537/diff/
> 
> 
> Testing
> -------
> 
> Introduced new unit tests to test new functionality
> All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>

Reply via email to