[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Spencer Ho updated MAPREDUCE-1224: ---------------------------------- Attachment: SqlManager.java The original code from line 66 to 68 of SqlManager was protected String getColNamesQuery(String tableName) { return "SELECT t.* FROM " + tableName + " AS t"; } As this method was invoked three times in the code to generated column name and type information, it queries the database three times. For a large table, it makes the whole loading work to query the whole table four time. The change made is to add an always-false where clause that forces db to return zero-size result set yet with meta data. (from line 66 to 69) protected String getColNamesQuery(String tableName) { // adding where clause to prevent loading a big table return "SELECT t.* FROM " + tableName + " AS t WHERE 1=0"; } The execution time for retrieving one of the large tables we have reduced from 40 minutes to 11 minutes. > Calling "SELECT t.* from <table> AS t" to get meta information is too > expensive for big tables > ---------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1224 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop > Affects Versions: 0.20.1 > Environment: all platforms, generic jdbc driver > Reporter: Spencer Ho > Attachments: SqlManager.java > > > The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table > spec is too expensive for big tables, and it was called twice to generate > column names and types. For tables that are big enough to be map-reduced, > this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.