Hi Venkat,

Thanks for your quick response. We did some testing on this and found some 
limitations with these features.We will investigate further with your 
suggestions.

But what we are looking for here is answers to process about how to contribute 
back to community. It will be great if you could help us with following 
questions  :

1.       Can developer create and branch with Sqoop and start its development 
directly?
2.       Who decide the timelines of the features delivery ?
3.       What is expected release date of Sqoop 1.4.6?
4.       Who decides the feature priorities?
5.       In case feature priorities are decided by product owner, can we 
negotiate with PM on feature priorities?
6.       Once development work will be completed then who will do the code 
review?
7.       Who will create the documentation?

Thanks and Regards,
Rakesh.


From: Rakesh Sharma <[email protected]<mailto:[email protected]>>
Date: Saturday, October 4, 2014 at 12:31 AM
To: Venkat Ranganathan 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Atul Gupta 
<[email protected]<mailto:[email protected]>>
Cc: Shashank Tandon <[email protected]<mailto:[email protected]>>
Subject: Re: Sqoop

++Atul

From: Venkat Ranganathan 
<[email protected]<mailto:[email protected]>>
Date: Friday, October 3, 2014 at 11:55 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: Rakesh Sharma <[email protected]<mailto:[email protected]>>, 
Shashank Tandon <[email protected]<mailto:[email protected]>>
Subject: Re: Sqoop

Atul Gupta

Please see below


>> Sqoop currently doesn't support dynamic partitions. We are planning to 
>> support dynamic partitions. Basically, user can specify the partition column 
>> and sqoop will figure out the partition details and create partitions if 
>> they don't exist.

Dynamic partition has been part of Sqoop for a while as part of hcatalog 
support.

>>.  Sqoop only supports Partitions String fields. We are planning to extend it 
>>to some other data types like integer, Date etc.
Even with hcatalog integration (and the enhancements to this integration we did 
to support all hive types), this is an outstanding issue.   Being fixed in 
hcatalog also

>>  5.  Sqoop doesn't support external table for hive. We are planning to add 
>> this feature as well

This is also addressed by the hcatalog integration

Venkat

On Fri, Oct 3, 2014 at 11:03 AM, Atul Gupta 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Let me give you some background of the project first, Currently we are using 
in-house tool for moving the data from RDBMS to HDFS and based on our business 
requirements we added lots of new features in that tool. The current in house 
solution is not scalable and have maintainability issues also, so two months 
back we decided to move on Sqoop. When we did the feature gap analysis between 
in-house tool and Sqoop, we found that most of the in-house developed features 
are missing in the Sqoop.  then we decided that we should do customization 
around Sqoop. We also have plans to contribute back to open source community. 
Following are the list of features:

  1.  Sqoop currently doesn't support dynamic partitions. We are planning to 
support dynamic partitions. Basically, user can specify the partition column 
and sqoop will figure out the partition details and create partitions if they 
don't exist.
  2.  Sqoop only supports Partitions String fields. We are planning to extend 
it to some other data types like integer, Date etc.
  3.  Sqoop doesn't support data merge for hive tables,  specially if they are 
partitioned. We are planning to support merge for hive tables.
  4.  Sqoop doesn't restrict the maximum load for a given mapper and because of 
it sometimes it becomes overloaded and performance issues. We are planning to 
add Volume per mapper control for Sqoop.
  5.  Sqoop doesn't support external table for hive. We are planning to add 
this feature as well
  6.  Merge can be done only on one key. We will be enhancing it to support 
multiple field keys for merge.
These are at high level and there are few others also. Team is ready to work 
with Sqoop dev community and aware about the process, but we have following 
open questions in our mind that would really help us in taking the final call.


1.       Can developer create and branch with Sqoop and start its development 
directly?

2.       Who decide the timelines of the features delivery ?

3.       What is expected release date of Sqoop 1.4.6?

4.       Who decides the feature priorities?

5.       In case feature priorities are decided by product owner, can we 
negotiate with PM on feature priorities?

6.       Once development work will be completed then who will do the code 
review?

7.       Who will create the documentation?

In case you need more clarity, we are ready to setup webex/skype call with you.

Thanks,
Atul Gupta
Engineering Manager
Expedia Inc


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Reply via email to