Re: 1 big file or multiple smaller files for loading data from a database?

Todd Lee Wed, 07 Jul 2010 18:12:02 -0700

thanks. but is it going to create 1 big file in HDFS? I am currently
considering writing my own cascading job for this.


thx,
T

On Wed, Jul 7, 2010 at 6:06 PM, Sarah Sproehnle <[email protected]> wrote:

> Hi Todd,
>
> Are you planning to use Sqoop to do this import?  If not, you should.
> :)  It will do a parallel import, using MapReduce, to load the table
> into Hadoop.  With the --hive-import option, it will also create the
> Hive table definition.
>
> Cheers,
> Sarah
>
> On Wed, Jul 7, 2010 at 5:51 PM, Todd Lee <[email protected]> wrote:
> > Hi,
> > I am new to Hive and Hadoop in general. I have a table in Oracle that has
> > millions of rows and I'd like to export it into HDFS so that I can run
> some
> > Hive queries. My first question is, is it recommended to export the
> entire
> > table as a single file (possibly 5GB), or more files with smaller sizes
> (10
> > files each 500mb)? also, does it matter if I put the files under
> different
> > sub-directories before I do the data load in Hive? or everything has to
> be
> > under the same folder?
> > Thanks,
> > T
> > p.s. I am sorry if this post is submitted twice.
>
>
>
> --
> Sarah Sproehnle
> Educational Services
> Cloudera, Inc
> http://www.cloudera.com/training
>

Re: 1 big file or multiple smaller files for loading data from a database?

Reply via email to