Hey all I'm looking to get a better understanding of exactly what is involved in implemented a new connector for Sqoop 2. I've read through the documentation at http://sqoop.apache.org/docs/1.99.3/ConnectorDevelopment.html but it seems a little light on detail in places so I'd appreciate if people could fill in the gaps in my understanding or share their own experiences of creating a connector.
Firstly I'd like to understand whether you must implement both Importer and Exporter or whether you can just do one? The connector I'm interested in developing would initially be intended for use only as an output target I.e. taking data from relational databases using existing connectors and then outputting them in a suitable format for the databases I'm looking to support. Whether both are needed or not the documentation makes reference to transforming to and from an intermediate format which is discussed on the wiki at https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+repres entation - has the project actually made a decision on what the intermediate format looks like? Is is the CSV style format described on that page? How exactly is portioning expected to work particularly with regards to the relationship to the Extractor? The documentation says that a partitioner creates the partitions and then the extractor gets passed the partition to process. I assume Partition can be defined fairly freely (other than the need to be Writable and toString()-able) as the needs of a connector dictate. The documentation glosses over ConnectionConfiguration (some of the sections are empty) but I assume this is the class I would use to pass in connection configuration and also whatever mapping rules are necessary to translate the data to my target format. Can I safely sub-class ConnectionConfiguration or are there other pre-defined mechanisms for passing connection specific configuration? Thanks for putting up with so many questions, Regards, Rob
