How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a
specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
DIRECTORY '/directory_path_name/')?
Thanks for your answers


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
will hive -e query  filename  or hive -f query.q  filename will do ?

you specially want it to write into a named file on hdfs only?


On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN
matouk.iftis...@ysance.comwrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Bertrand Dechoux
The question is what is the volume of your output. There is one file per
output task (map or reduce) because that way each can write it
independently and in parallel. That's how mapreduce work. And except by
forcing the number of tasks to 1, there is no certain way to have one
output file.

But indeed if the volume is low enough, you could also capture the standard
output into a local file like Nitin described.

Bertrand


On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




-- 
Bertrand Dechoux


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Michael Malak
I have found that for output larger than a few GB, redirecting stdout results 
in an incomplete file.  For very large output, I do CREATE TABLE MYTABLE AS 
SELECT ... and then copy the resulting HDFS files directly out of 
/user/hive/warehouse.
 


 From: Bertrand Dechoux decho...@gmail.com
To: user@hive.apache.org 
Sent: Thursday, July 4, 2013 7:09 AM
Subject: Re: How Can I store the Hive query result in one file ?
  


The question is what is the volume of your output. There is one file per output 
task (map or reduce) because that way each can write it independently and in 
parallel. That's how mapreduce work. And except by forcing the number of tasks 
to 1, there is no certain way to have one output file.

But indeed if the volume is low enough, you could also capture the standard 
output into a local file like Nitin described.

Bertrand



On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar
 


-- 
Bertrand Dechoux 

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Thanks for your responses,
effctively  the answer of Bertrand make this possible: the set of hive
properities below froce thet job to write the hive result in one file
whithout specifing the name (_0) :
 set hive.exec.reducers.max = 1;

set mapred.reduce.tasks = 1;

for Nitin, I want to store the results of SELECT not the stdout (log) of
execution of the query, is this applicable for the results of SELECT?




2013/7/4 Michael Malak michaelma...@yahoo.com

 I have found that for output larger than a few GB, redirecting stdout
 results in an incomplete file.  For very large output, I do CREATE TABLE
 MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out
 of /user/hive/warehouse.

*From:* Bertrand Dechoux decho...@gmail.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, July 4, 2013 7:09 AM
 *Subject:* Re: How Can I store the Hive query result in one file ?

 The question is what is the volume of your output. There is one file per
 output task (map or reduce) because that way each can write it
 independently and in parallel. That's how mapreduce work. And except by
 forcing the number of tasks to 1, there is no certain way to have one
 output file.

 But indeed if the volume is low enough, you could also capture the
 standard output into a local file like Nitin described.

 Bertrand


 On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




 --
 Bertrand Dechoux





Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
the one i said does not work on hdfs files. Its just one way to write the
stdlog to a file.

I am not sure if hive allows you named files for output and the above
settings will make your query run really slow if you have large dataset.

if you are really specific on having a filename then for now I am not aware
if hive supports it. I did a quick search but did not find anything useful.
If you need a quick way to get to your solution then pig supports the store
function and its written to a named file.

i will search in depth and see if there is anything in configurations of
hive


On Thu, Jul 4, 2013 at 8:50 PM, Matouk IFTISSEN
matouk.iftis...@ysance.comwrote:

 Thanks for your responses,
 effctively  the answer of Bertrand make this possible: the set of hive
 properities below froce thet job to write the hive result in one file
 whithout specifing the name (_0) :
  set hive.exec.reducers.max = 1;

 set mapred.reduce.tasks = 1;

 for Nitin, I want to store the results of SELECT not the stdout (log) of
 execution of the query, is this applicable for the results of SELECT?




 2013/7/4 Michael Malak michaelma...@yahoo.com

 I have found that for output larger than a few GB, redirecting stdout
 results in an incomplete file.  For very large output, I do CREATE TABLE
 MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out
 of /user/hive/warehouse.

*From:* Bertrand Dechoux decho...@gmail.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, July 4, 2013 7:09 AM
 *Subject:* Re: How Can I store the Hive query result in one file ?

 The question is what is the volume of your output. There is one file per
 output task (map or reduce) because that way each can write it
 independently and in parallel. That's how mapreduce work. And except by
 forcing the number of tasks to 1, there is no certain way to have one
 output file.

 But indeed if the volume is low enough, you could also capture the
 standard output into a local file like Nitin described.

 Bertrand


 On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




 --
 Bertrand Dechoux






-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Edward Capriolo
Normally if use set mapred.reduce.tasks=1 you get one output file. You can
also look at
*hive*.*merge*.*mapfiles*, mapred.reduce.tasks, hive.merge.reducefiles also
you can use a separate tool https://github.com/edwardcapriolo/filecrush


On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar



Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop


Adding to that

- Multiple files can be concatenated from the directory like
Example:  cat 0-0 00-1 0-2  final




 From: Raj Hadoop hadoop...@yahoo.com
To: user@hive.apache.org user@hive.apache.org; matouk.iftis...@ysance.com 
matouk.iftis...@ysance.com 
Sent: Friday, July 5, 2013 12:17 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar