Hi,

Sorry to hijack this thread. But I am curious if there any other in-built 
option to merge files in the directory before loading data into the table.

I have a directory in the local file system which contains many small files. I 
want to load it to a single hive table. I am wondering what would be the best 
approach to this problem.

Thanks,
Ankita


-----Original Message-----
From: Namit Jain [mailto:[email protected]] 
Sent: Monday, August 09, 2010 9:32 AM
To: [email protected]
Subject: RE: How to merge small files

Yes, it will try to run another map-reduce job to merge the files
________________________________________
From: lei liu [[email protected]]
Sent: Monday, August 09, 2010 8:57 AM
To: [email protected]
Subject: Re: How to merge small files

Could you tell me whether the query is slower if I two parameters both are true?

2010/8/9 Namit Jain <[email protected]><mailto:[email protected]>>
That's right

________________________________________
From: lei liu [[email protected]<mailto:[email protected]>]
Sent: Sunday, August 08, 2010 7:18 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: How to merge small files

Thank you for your reply.

Your mean is I will execute below statement:

statement.execute("set hive.merge.mapfiles=true");
statement.execute("set hive.merge.mapredfiles=true");

The two parementers are both true, right?

2010/8/6 Namit Jain 
<[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>>
  HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
  HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


Set the above parameters to true before your query.



________________________________________
From: lei liu 
[[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>]
Sent: Thursday, August 05, 2010 8:47 PM
To: 
[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei



The information contained in this email message and its attachments is intended 
only for the private and confidential use of the recipient(s) named above, 
unless the sender expressly agrees otherwise. Transmission of email over the 
Internet is not a secure communications medium. If you are requesting or have 
requested the transmittal of personal data, as defined in applicable privacy 
laws by means of email or in an attachment to email, you must select a more 
secure alternate means of transmittal that supports your obligations to protect 
such personal data. If the reader of this message is not the intended recipient 
and/or you have received this email in error, you must take no action based on 
the information in this email and you are hereby notified that any 
dissemination, misuse or copying or disclosure of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by email and delete the original message. 

Reply via email to