[ https://issues.apache.org/jira/browse/SQOOP-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766753#comment-15766753 ]
chen kai commented on SQOOP-951: -------------------------------- Removing the parent path if exist sub dir ? This will cause more splits ``` private void scanSubDirectory(Path path, FileSystem fs, List<Path> pathList) throws IOException { FileStatus[] status = fs.listStatus(path); // remove parent path pathList.remove(path); for(FileStatus fstat : status) { if(fstat.isDir()) { pathList.add(fstat.getPath()); scanSubDirectory(fstat.getPath(), fs, pathList); } else { pathList.add(fstat.getPath()); } } } ``` > --export-dir to support subdirectories > -------------------------------------- > > Key: SQOOP-951 > URL: https://issues.apache.org/jira/browse/SQOOP-951 > Project: Sqoop > Issue Type: Improvement > Affects Versions: 1.4.3 > Environment: Debian GNU/Linux 6.0 > Reporter: Matthieu Labour > Assignee: Vasanth kumar RJ > Attachments: SQOOP-951.patch > > > I am using sqoop-1.4.2 to export to Sql. > --export-dir does not work when the dir being passed is the root of > subdirectories. -export-dir is not doing any recursive lookup for files. It > expects directory with files that you want export. > It would be great if one could pass a directory with subdirectories. > Example: > The following command exports the data to Sql > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > jdbc:postgresql://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:XXXX/xxxxxxxxxxxxx > --username xxxxxxxxxx --password xxxxxxxxxx --table > ml_ys_log_gmt_daily_experiment_2 --export-dir > =hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01 > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose --batch > hadoop@domU-XX-XX-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/2$ hadoop fs -ls > hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01 > Found 1 items > -rw-r--r-- 1 hadoop supergroup 15931406 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01/part-r-00001 > The following command does not export the data to Sql > ~/sqoop-1.4.2.bin__hadoop-1.0.0/bin/sqoop export --connect > jdbc:postgresql://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:XXXX/xxxxxxxxxxxxx > --username xxxxxxxxxx --password xxxxxxxxxx --table > ml_ys_log_gmt_daily_experiment_2 --export-dir > =hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized > --input-fields-terminated-by='\t' --lines-terminated-by='\n' --verbose --batch > hadoop@domU-XX-XX-XX-XX-XX-XX:/mnt/var/lib/hadoop/steps/2$ hadoop fs -ls > hdfs:///mnt/var/lib/hadoop/dfs/logs_daily_sanitized/ > Found 44 items > -rw-r--r-- 1 hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/_SUCCESS > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-01 > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-02 > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-03 > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-04 > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-05 > drwxr-xr-x - hadoop supergroup 0 2013-03-15 17:03 > /mnt/var/lib/hadoop/dfs/logs_daily_sanitized/dt=2013-02-06 -- This message was sent by Atlassian JIRA (v6.3.4#6332)