Once you have a Hive Job flow running on Amazon EMR, you'll have access to the 
file system on the underlying EC2 machines (you'll get the machine name, etc 
once the cluster is running). You can then move your data files on the EC2 
machine file system and load it into HDFS/Hive. I am not sure about the Sqoop 
part

From: Bhavesh Shah [mailto:bhavesh25s...@gmail.com]
Sent: Monday, April 23, 2012 8:42 AM
To: u...@hive.apache.org; dev@hive.apache.org
Subject: [Marketing Mail] Doubts related to Amazon EMR


Hello all,
I want to deploy my task on Amazon EMR. But as I am new to Amazon Web Services 
I am confused in understanding the concepts.

My Use Case:

I want to import the large data from EC2 through SQOOP into the Hive. Imported 
data in Hive will get processed in Hive by applying some algorithm and will 
generate some result (in table form, in Hive only). And generated result will 
be exported back to Ec2 again through SQOOP only.

I am new to Amazon Web Services and want to implement this use case with the 
help of AWS EMR. I have implemented it on local machine.

I have read some links related to AWS EMR for launching the instance and about 
what is EMR, How it works and etc...

I have some doubts about EMR like:

1) EMR uses S3 Buckets, which holds Input and Output data Hadoop Processing (in 
the form of Objects). ---> I didn't get How to store the data in the form of 
Objects on S3 (My data will be files)

2) As already said I have implemented a task for my use case in Java. So If I 
create the JAR of my program and create the Job Flow with Custom JAR. Will it 
be possible to implement like this or do need to do some thing extra for that?

3) As I said in my Use Case that I want to export my result back to Ec2 with 
the help of SQOOP. Does EMR have support of SQOOP?



If you have any kind of idea related to AWS, please reply me with your answer 
as soon as possible. I want to do this as early as possible.

many Thanks.




--
Regards,
Bhavesh Shah

Reply via email to