Hi Venkat, Thanks for your quick response. We did some testing on this and found some limitations with these features.We will investigate further with your suggestions.
But what we are looking for here is answers to process about how to contribute back to community. It will be great if you could help us with following questions : 1. Can developer create and branch with Sqoop and start its development directly? 2. Who decide the timelines of the features delivery ? 3. What is expected release date of Sqoop 1.4.6? 4. Who decides the feature priorities? 5. In case feature priorities are decided by product owner, can we negotiate with PM on feature priorities? 6. Once development work will be completed then who will do the code review? 7. Who will create the documentation? Thanks and Regards, Rakesh. From: Rakesh Sharma <[email protected]<mailto:[email protected]>> Date: Saturday, October 4, 2014 at 12:31 AM To: Venkat Ranganathan <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, Atul Gupta <[email protected]<mailto:[email protected]>> Cc: Shashank Tandon <[email protected]<mailto:[email protected]>> Subject: Re: Sqoop ++Atul From: Venkat Ranganathan <[email protected]<mailto:[email protected]>> Date: Friday, October 3, 2014 at 11:55 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Cc: Rakesh Sharma <[email protected]<mailto:[email protected]>>, Shashank Tandon <[email protected]<mailto:[email protected]>> Subject: Re: Sqoop Atul Gupta Please see below >> Sqoop currently doesn't support dynamic partitions. We are planning to >> support dynamic partitions. Basically, user can specify the partition column >> and sqoop will figure out the partition details and create partitions if >> they don't exist. Dynamic partition has been part of Sqoop for a while as part of hcatalog support. >>. Sqoop only supports Partitions String fields. We are planning to extend it >>to some other data types like integer, Date etc. Even with hcatalog integration (and the enhancements to this integration we did to support all hive types), this is an outstanding issue. Being fixed in hcatalog also >> 5. Sqoop doesn't support external table for hive. We are planning to add >> this feature as well This is also addressed by the hcatalog integration Venkat On Fri, Oct 3, 2014 at 11:03 AM, Atul Gupta <[email protected]<mailto:[email protected]>> wrote: Hi, Let me give you some background of the project first, Currently we are using in-house tool for moving the data from RDBMS to HDFS and based on our business requirements we added lots of new features in that tool. The current in house solution is not scalable and have maintainability issues also, so two months back we decided to move on Sqoop. When we did the feature gap analysis between in-house tool and Sqoop, we found that most of the in-house developed features are missing in the Sqoop. then we decided that we should do customization around Sqoop. We also have plans to contribute back to open source community. Following are the list of features: 1. Sqoop currently doesn't support dynamic partitions. We are planning to support dynamic partitions. Basically, user can specify the partition column and sqoop will figure out the partition details and create partitions if they don't exist. 2. Sqoop only supports Partitions String fields. We are planning to extend it to some other data types like integer, Date etc. 3. Sqoop doesn't support data merge for hive tables, specially if they are partitioned. We are planning to support merge for hive tables. 4. Sqoop doesn't restrict the maximum load for a given mapper and because of it sometimes it becomes overloaded and performance issues. We are planning to add Volume per mapper control for Sqoop. 5. Sqoop doesn't support external table for hive. We are planning to add this feature as well 6. Merge can be done only on one key. We will be enhancing it to support multiple field keys for merge. These are at high level and there are few others also. Team is ready to work with Sqoop dev community and aware about the process, but we have following open questions in our mind that would really help us in taking the final call. 1. Can developer create and branch with Sqoop and start its development directly? 2. Who decide the timelines of the features delivery ? 3. What is expected release date of Sqoop 1.4.6? 4. Who decides the feature priorities? 5. In case feature priorities are decided by product owner, can we negotiate with PM on feature priorities? 6. Once development work will be completed then who will do the code review? 7. Who will create the documentation? In case you need more clarity, we are ready to setup webex/skype call with you. Thanks, Atul Gupta Engineering Manager Expedia Inc CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
