Re: 1 big file or multiple smaller files for loading data from a database?

Sarah Sproehnle Wed, 07 Jul 2010 18:07:28 -0700

Hi Todd,

Are you planning to use Sqoop to do this import?  If not, you should.
:)  It will do a parallel import, using MapReduce, to load the table
into Hadoop.  With the --hive-import option, it will also create the
Hive table definition.


Cheers,
Sarah

On Wed, Jul 7, 2010 at 5:51 PM, Todd Lee <[email protected]> wrote:
> Hi,
> I am new to Hive and Hadoop in general. I have a table in Oracle that has
> millions of rows and I'd like to export it into HDFS so that I can run some
> Hive queries. My first question is, is it recommended to export the entire
> table as a single file (possibly 5GB), or more files with smaller sizes (10
> files each 500mb)? also, does it matter if I put the files under different
> sub-directories before I do the data load in Hive? or everything has to be
> under the same folder?
> Thanks,
> T
> p.s. I am sorry if this post is submitted twice.



-- 
Sarah Sproehnle
Educational Services
Cloudera, Inc
http://www.cloudera.com/training

Re: 1 big file or multiple smaller files for loading data from a database?

Reply via email to