RE: HIVE: How to Load CSV File?

Bennie Schut Mon, 26 Jul 2010 23:45:11 -0700

Hi,

Hdfs can be better compared with something like ext3 then with mysql. You can 
use the "hadoop fs" to look at the files on hdfs just like you would look at 
the "/mysql" dir on ext3. Hdfs internally splits these files into chunks of 64M 
(configurable) and each chunk will end up on the underlying linux filesystem.  
You can configure the location of these chunks in a config file called 
hdfs-site.xml with a property called "dfs.data.dir" which defaults to 
"${hadoop.tmp.dir}/dfs/data"
I doubt there are many use cases where looking at these individual chunks is 
useful tough.
If you are interested to see how much space something is using use something 
like this:
hadoop fs -du /user/hive/warehouse/


Just keep in mind if you have a replication factor of 3 on your setup it means 
you are using 3x the physical space the -du command is telling you (roughly).

I hope that helps.

Bennie.

________________________________
From: vaibhav negi [mailto:[email protected]]
Sent: Tuesday, July 27, 2010 8:04 AM
To: [email protected]
Subject: Re: HIVE: How to Load CSV File?

Hi ,

By actual physical path , i mean full path in linux / directory. Like for 
mysql, there is /mysql directory .
Inside it i can see files for individual tables  and also can see what lies 
inside those files.



Vaibhav Negi

2010/7/26 Alex Rovner <[email protected]<mailto:[email protected]>>
Hadoop fs -du command will show you the size of the files. What do you mean by 
physical?

Sent from my iPhone

On Jul 26, 2010, at 6:43 AM, "vaibhav negi" 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Hadoop -dfs command show logical path /user/hive/warehouse. How can i see where 
this directory exists physically ?



Vaibhav Negi

On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar 
<<mailto:[email protected]>[email protected]<mailto:[email protected]>> 
wrote:
Hi,
The default HWI (hive web interface) provides some basic metadata, but don't 
think file sizes are included. In any case, you can query using the common 
hadoop dfs commands. The default warehouse directory is as set in your hive 
conf xml.

Amogh



On 7/26/10 2:30 PM, "vaibhav negi" 
<<http://[email protected]>[email protected]<mailto:[email protected]>>
 wrote:
Hi,

Thanks amogh.
How can i browse actual physical location  of hive tables juts like i see mysql 
tables in mysql directory. I want to check actual disk space consumed by hive 
tables.



Vaibhav Negi


On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar 
<<http://[email protected]>[email protected]<mailto:[email protected]>> 
wrote:
Hi,
You can create an external table pointing to data already on hdfs and 
specifying the delimiter-
CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User',
                    country STRING COMMENT 'country of origination')
    COMMENT 'This is the staging page view table'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
    STORED AS TEXTFILE
    LOCATION '/user/data/staging/page_view';

<http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables>http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables
   for more

HTH,
Amogh


On 7/26/10 1:02 PM, "vaibhav negi" 
<<http://[email protected]>[email protected]<mailto:[email protected]>
 <<http://[email protected]>http://[email protected]> > wrote:
Hi,

Is there some way to load csv file into hive?

Vaibhav Negi

RE: HIVE: How to Load CSV File?

Reply via email to