> On July 11, 2013, 9:54 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java,
> >  line 184
> > <https://reviews.apache.org/r/12451/diff/1/?file=319957#file319957line184>
> >
> >     I'm concerned a bit of using count() aggregate function as it might 
> > lead to another full table scan which might significantly hurt performance. 
> > Maybe we could make the ability for checking nulls in the split by column 
> > optional?
> 
> Mengwei Ding wrote:
>     Yes, this is an issue. I will use 'count(1)' instead.
> 
> Jarek Cecho wrote:
>     I'm afraid that count(1) won't help either. In case that the database 
> engine is not storing the precise number of columns (such as InnoDB in 
> MySQL), queries of type "select count(*/1) from table" will result in full 
> table scan, which might be quite heavy operation.
> 
> Mengwei Ding wrote:
>     Yes, I did some research just now. For null values, they won't be indexed 
> in database. Thus, to retrieve all null values, it has to scan the whole 
> table. I just thought out another idea that we don't necessarily need to 
> check whether the column has nulls, instead we could add an extra partition 
> for nulls at any time. In this way, we reduce the full table scan to one, 
> since we cannot avoid full table scan. By the way, what do you mean by 
> checking nulls in the split by column optional ?

Will ask user whether the partition column could be null.


- Mengwei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12451/#review23028
-----------------------------------------------------------


On July 10, 2013, 7:02 p.m., Mengwei Ding wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12451/
> -----------------------------------------------------------
> 
> (Updated July 10, 2013, 7:02 p.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Bugs: SQOOP-1049
>     https://issues.apache.org/jira/browse/SQOOP-1049
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> commit 47e73c30b49be0168459d76bf8993205c7a4f4fc
> Author: Mengwei Ding <[email protected]>
> Date:   Wed Jul 10 11:41:05 2013 -0700
> 
>     SQOOP-1049: Sqoop2: Record not imported if partition column value is NULL
> 
> :100644 100644 abcc89d... a940d15... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java
> :100644 100644 671bb4a... d331ae8... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java
> :100644 100644 96818ba... 357fefb... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
> :100644 100644 4401800... ff80ed3... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
> 
> 
> Diffs
> -----
> 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java
>  abcc89d 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java
>  671bb4a 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
>  96818ba 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
>  4401800 
> 
> Diff: https://reviews.apache.org/r/12451/diff/
> 
> 
> Testing
> -------
> 
> Have done a manual test, in which I successfully import a table with some 
> null values in partition column.
> 
> 
> Thanks,
> 
> Mengwei Ding
> 
>

Reply via email to