[jira] Updated: (MAPREDUCE-1224) Calling "SELECT t.* from AS t" to get meta information is too expensive for big tables

Fri, 20 Nov 2009 15:02:03 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spencer Ho updated MAPREDUCE-1224:
----------------------------------

    Attachment: SqlManager.java

The original code from line 66 to 68 of SqlManager was

protected String getColNamesQuery(String tableName) {
    return "SELECT t.* FROM " + tableName + " AS t";
}

As this method was invoked three times in the code to generated column name and 
type information, it queries the database three times.  For a large table, it 
makes the whole loading work to query the whole table four time.

The change made is to add an always-false where clause that forces db to return 
zero-size result set yet with meta data. (from line 66 to 69)

  protected String getColNamesQuery(String tableName) {
    // adding where clause to prevent loading a big table
    return "SELECT t.* FROM " + tableName + " AS t WHERE 1=0";
  }

The execution time for retrieving one of the large tables we have reduced from 
40 minutes to 11 minutes.

> Calling "SELECT t.* from <table> AS t" to get meta information is too 
> expensive for big tables
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>    Affects Versions: 0.20.1
>         Environment: all platforms, generic jdbc driver
>            Reporter: Spencer Ho
>         Attachments: SqlManager.java
>
>
> The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table 
> spec is too expensive for big tables, and it was called twice to generate 
> column names and types.  For tables that are big enough to be map-reduced, 
> this is too expensive to make sqoop useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to