> On July 10, 2014, 8:22 a.m., Venkat Ranganathan wrote: > > src/java/org/apache/sqoop/manager/MainframeManager.java, line 75 > > <https://reviews.apache.org/r/22516/diff/1/?file=608148#file608148line75> > > > > Is import into Hbase and Accumulo supported by this tool? It looks > > like the only target supported is HDFS text files from the command help. > > Mariappan Asokan wrote: > Each record in a mainframe dataset is treated as a single field (or > column.) So, theoretically HBase, Accumulo, and Hive are supported but with > limited usability. So, I did not add them in the documentation. If you feel > strongly that they should be documented, I can work on that in the next > version of the patch. > > Venkat Ranganathan wrote: > I feel it would be good to say we import only as text files and leave > further processing, loading into hive/hbase upto the user as the composition > of the records and needed processing differ and the schema can't be inferred. > > Mariappan Asokan wrote: > I agree with you. To avoid confusion, I plan to remove support for > parsing input format, output format, hive, hbase, hcatalog, and codegen > options. This will synchronize the document with the code. What do you think? > > > Venkat Ranganathan wrote: > Sorry for the delay. I was wondering whether the mainframe connector > can just define connector specific extra args and not create another tool. > Please see NetezzaManager or DirectNetezzaManager as an example. May be you > have to invent a new synthetic URI format say jdbc:mfftp:<host > address>:<port>/dataset and choose your Connection Manager when --connect > option with the above uri format is given. That should simplify a whole lot > in my opinion. What do you think? > > Mariappan Asokan wrote: > Thanks for your suggestions. Sorry, I did not get back sooner. In Sqoop > 1.x, there is a strong assumption that input source is always a database > table. Due to this the sqoop import tool has many options that are relevant > to a source database table. A mainframe source is totally different from a > database table. I think it is better to create a separate tool for mainframe > import rather than just a new connection manager. The mainframe import tool > will not support many options that the database import tool supports. It > will have its own options that the database import tool does not support. At > present, these are the host name and partitioned dataset name. In the > future, the mainframe import tool may be enhanced with metadata specific or > connection specific arguments unique to mainframe. Creating a synthetic URI > for a connection seems to be somewhat artificial to me. > > Contrary to what I stated before, considering possible future > enhancements, I think it is better to retain the support for parsing input > format, output format, Hive, HBase, HCatalog, and codegen options. The > documentation will be enhanced in the future to reflect this support. > > > Venkat Ranganathan wrote: > Thanks for your thoughts on the suggestion. As you correctly pointed > out, Sqoop 1.x has a JDBC model (that is why you had to implement a > ConnectionManager and provide pseudo values for column types etc (always > returning VARCHAR). I understand there will be options mainframe import > will not support (much like there are mysql specific options or netezza or > sqlserver specific options). I understand you want to have specific > metadata for mainframe import. That may be tricky. Connection specific > arguments can be implemented as how JDBC connection specific arguments are > done. > > The reason for my suggestion was primarily to piggy back on the > implementation for imports into hive/hbase in future when you have the > ability to provide specific metadata on the data. > You can definitely parse the various options, but you have to explicitly > check and exit if the unsupported options are currently used. > > My only worry with this tool is that this may be one off for mainframe > imports alone and we will be starting off with hdfs import only until you get > to the rest of the parts and when we finally see this, it is basically > duplicating some of the code and may be difficult to maintain, > > > Gwen Shapira wrote: > I just checked the possibility of adding non-JDBC imports as part of the > import tool, using fake connection URL as you suggested. > This is not feasible - ConnManager (which you need to inherit) has to > implement getConnection, which returns java.sql.Connection. You can't return > this connection object for an FTP. Same for readTable which must return a > ResultSet. > > I think a separate tool is the only way to go.
Never mind :) I missed the fact that the Mainframe tool actually extends ConnManager anyways. - Gwen ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22516/#review47555 ----------------------------------------------------------- On June 14, 2014, 10:46 p.m., Mariappan Asokan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22516/ > ----------------------------------------------------------- > > (Updated June 14, 2014, 10:46 p.m.) > > > Review request for Sqoop. > > > Repository: sqoop-trunk > > > Description > ------- > > This is to move mainframe datasets to Hadoop. > > > Diffs > ----- > > src/java/org/apache/sqoop/manager/MainframeManager.java PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeDatasetFTPRecordReader.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeDatasetImportMapper.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeDatasetInputFormat.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeDatasetInputSplit.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeDatasetRecordReader.java > PRE-CREATION > src/java/org/apache/sqoop/mapreduce/MainframeImportJob.java PRE-CREATION > src/java/org/apache/sqoop/tool/MainframeImportTool.java PRE-CREATION > src/java/org/apache/sqoop/tool/SqoopTool.java dbe429a > src/java/org/apache/sqoop/util/MainframeFTPClientUtils.java PRE-CREATION > src/test/org/apache/sqoop/manager/TestMainframeManager.java PRE-CREATION > > src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetFTPRecordReader.java > PRE-CREATION > src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetInputFormat.java > PRE-CREATION > src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetInputSplit.java > PRE-CREATION > src/test/org/apache/sqoop/mapreduce/TestMainframeImportJob.java > PRE-CREATION > src/test/org/apache/sqoop/tool/TestMainframeImportTool.java PRE-CREATION > src/test/org/apache/sqoop/util/TestMainframeFTPClientUtils.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/22516/diff/ > > > Testing > ------- > > > Thanks, > > Mariappan Asokan > >
