Hi Todd, Are you planning to use Sqoop to do this import? If not, you should. :) It will do a parallel import, using MapReduce, to load the table into Hadoop. With the --hive-import option, it will also create the Hive table definition.
Cheers, Sarah On Wed, Jul 7, 2010 at 5:51 PM, Todd Lee <[email protected]> wrote: > Hi, > I am new to Hive and Hadoop in general. I have a table in Oracle that has > millions of rows and I'd like to export it into HDFS so that I can run some > Hive queries. My first question is, is it recommended to export the entire > table as a single file (possibly 5GB), or more files with smaller sizes (10 > files each 500mb)? also, does it matter if I put the files under different > sub-directories before I do the data load in Hive? or everything has to be > under the same folder? > Thanks, > T > p.s. I am sorry if this post is submitted twice. -- Sarah Sproehnle Educational Services Cloudera, Inc http://www.cloudera.com/training
