Question regarding nested complex data type

2013-06-20 Thread neha
Hi All,

I have 2 questions about complex data types in nested composition.

1  I did not find a way to provide delimiter information in DDL if one or
more column has nested array/struct. In this case, default delimiter has to
be used for complex type column.
Please let me know if this is a limitation as of now or I am missing
something.

e.g.:
*DDL*:
hive create table example(col1 int, col2
arraystructst1:int,st2:string) row format delimited fields terminated
by ',';
OK
Time taken: 0.226 seconds

*Sample data loaded:*
1,1^Cstring1^B2^Cstring2

*O/P:*
hive select * from example;
OK
1[{st1:1,st2:string1},{st1:2,st2:string2}]
Time taken: 0.288 seconds

2  For the same DDL given above, if we provide clause* collection items
terminated by '|' *and still use default delimiters (since there is no way
to use given delimiter '|') then the select query shows incorrect data.
Please let me know if this is something expected.

e.g.
*DDL*:
hive create table example(col1 int, col2
arraystructst1:int,st2:string) row format delimited fields terminated
by ',' collection items terminated by '|';
OK
Time taken: 0.175 seconds

*Sample data loaded:*
1,1^Cstring1^B2^Cstring2

*O/P:
*hive select * from
example;

OK
1[{st1:1,st2:string1\u00022}]
Time taken: 0.141 seconds
**
Thanks  Regards.


Re: show table throwing strange error

2013-06-20 Thread Mohammad Tariq
Thank you for the response ma'am. It didn't help either.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind sunitarv...@gmail.comwrote:

 Your issue seems familiar. Try logging out of hive session and re-login.

 Sunita


 On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq donta...@gmail.comwrote:

 Hello list,

  I have a hive(0.9.0) setup on my Ubuntu box running
 hadoop-1.0.4. Everything was going smooth till now. But today when I issued
 *show tables* I got some strange error on the CLI. Here is the error :

 hive show tables;
 FAILED: Parse Error: line 1:0 character '' not supported here
 line 1:1 character '' not supported here
 line 1:2 character '' not supported here
 line 1:3 character '' not supported here
 line 1:4 character '' not supported here
 line 1:5 character '' not supported here
 line 1:6 character '' not supported here
 line 1:7 character '' not supported here
 line 1:8 character '' not supported here
 line 1:9 character '' not supported here
 line 1:10 character '' not supported here
 line 1:11 character '' not supported here
 line 1:12 character '' not supported here
 line 1:13 character '' not supported here
 line 1:14 character '' not supported here
 line 1:15 character '' not supported here
 line 1:16 character '' not supported here
 line 1:17 character '' not supported here
 line 1:18 character '' not supported here
 line 1:19 character '' not supported here
 line 1:20 character '' not supported here
 line 1:21 character '' not supported here
 line 1:22 character '' not supported here
 line 1:23 character '' not supported here
 line 1:24 character '' not supported here
 line 1:25 character '' not supported here
 line 1:26 character '' not supported here
 line 1:27 character '' not supported here
 line 1:28 character '' not supported here
 line 1:29 character '' not supported here
 line 1:30 character '' not supported here
 line 1:31 character '' not supported here
 line 1:32 character '' not supported here
 line 1:33 character '' not supported here
 line 1:34 character '' not supported here
 line 1:35 character '' not supported here
 line 1:36 character '' not supported here
 line 1:37 character '' not supported here
 line 1:38 character '' not supported here
 line 1:39 character '' not supported here
 line 1:40 character '' not supported here
 line 1:41 character '' not supported here
 line 1:42 character '' not supported here
 line 1:43 character '' not supported here
 line 1:44 character '' not supported here
 line 1:45 character '' not supported here
 line 1:46 character '' not supported here
 line 1:47 character '' not supported here
 line 1:48 character '' not supported here
 line 1:49 character '' not supported here
 line 1:50 character '' not supported here
 line 1:51 character '' not supported here
 line 1:52 character '' not supported here
 line 1:53 character '' not supported here
 line 1:54 character '' not supported here
 line 1:55 character '' not supported here
 line 1:56 character '' not supported here
 line 1:57 character '' not supported here
 line 1:58 character '' not supported here
 line 1:59 character '' not supported here
 line 1:60 character '' not supported here
 line 1:61 character '' not supported here
 line 1:62 character '' not supported here
 line 1:63 character '' not supported here
 line 1:64 character '' not supported here
 line 1:65 character '' not supported here
 line 1:66 character '' not supported here
 line 1:67 character '' not supported here
 line 1:68 character '' not supported here
 line 1:69 character '' not supported here
 line 1:70 character '' not supported here
 line 1:71 character '' not supported here
 line 1:72 character '' not supported here
 line 1:73 character '' not supported here
 line 1:74 character '' not supported here
 line 1:75 character '' not supported here
 line 1:76 character '' not supported here
 line 1:77 character '' not supported here
 line 1:78 character '' not supported here
 line 1:79 character '' not supported here
 .
 .
 .
 .
 .
 .
 line 1:378 character '' not supported here
 line 1:379 character '' not supported here
 line 1:380 character '' not supported here
 line 1:381 character '' not supported here

 Strangely other queries like *select foo from pokes where bar = 'tariq';*are 
 working fine. Tried to search over the net but could not find anything
 useful.Need some help.

 Thank you so much for your time.

 Warm Regards,
 Tariq
 cloudfront.blogspot.com





Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running  hive transformation steps .

It seems the add archive  does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.

To illustrate:

   add file modelfile.1;
   add file modelfile.2;
   ..
add file modelfile.N;

  Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.

But .. those model files take low *minutes* to all load..

instead when we try:
   add archive  modelArchive.tgz.

The problem is the archive does not get exploded apparently ..

I have an archive for example that contains shell scripts under the hive
directory stored inside.  I am *not *able to access hive/my-shell-script.sh
 after adding the archive. Specifically the following fails:

$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh

from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

Cannot run program hive/parse_qx.py: java.io.IOException: error=2,
No such file or directory


Re: Hive select shows null after successful data load

2013-06-20 Thread Stephen Sprague
hooray!   over one hurdle and onto the next one.   So something about that
one nested array caused the problem.  very strange. I wonder if there is a
smaller test case to look at as it seems not all arrays break it since i
see one for the attribute values.

As to the formatting issue i don't believe the native hive client has much
to offer there. its bare bones and record oriented.   beeline seems to
another opensource hive client which looks to have more options you might
have a gander at that though i don't think it has anything special for
pretty printing arrays, maps or structs but i could be wrong.

And then of course nothing stopping you though from exploring piping that
gnarly stuff into python (or whatever) and have it come out the other end
all nice and pretty -- and then posting that here. :)


On Wed, Jun 19, 2013 at 7:54 PM, Sunita Arvind sunitarv...@gmail.comwrote:

 Finally I could get it work. The issue resolves once I remove the arrays
 within position structure. So that is the limitation of the serde. I
 changed 'industries' to string and 'jobfunctions' to Mapstring,string I
 can query the table just fine now. Here is the complete DDL for reference:

 create external table linkedin_Jobsearch (

 jobs STRUCT
 values : ARRAYSTRUCT
 company : STRUCT
 id : STRING,
 name : STRING,
 postingDate : STRUCT
 year : STRING,
 day : STRING,
 month : STRING,
 descriptionSnippet : STRING,
 expirationDate : STRUCT
 year : STRING,
 day : STRING,
 month : STRING,
 position : STRUCT
 jobFunctions : MAPSTRING,STRING,--these were arrays of structure
 in my previous attempts
 industries : STRING,
 title : STRING,

 jobType : STRUCT
 code : STRING,
 name : STRING,
 experienceLevel : STRUCT
 code : STRING,
 name : STRING,
 id : STRING,
 customerJobCode : STRING,
 skillsAndExperience : STRING,
 salary : STRING,
 jobPoster : STRUCT
 id : STRING,
 firstName : STRING,
 lastName : STRING,
 headline : STRING,
 referralBonus : STRING,
 locationDescription : STRING
 )
 ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
 LOCATION '/user/sunita/tables/jobs';

 Thanks Stephen for sharing your thoughts. It helped.

 Also if someone /Stephen could help me display this information in a
 useful manner, that would be great. Right now all the values show up as
 arrays. Here is what I mean:
 For a query like this:
 hive select jobs.values.company.name, jobs.values.position.title,
 jobs.values.locationdescription from linkedin_jobsearch;

 This is the output:

 [CyberCoders,CyberCoders,CyberCoders,Management Science
 Associates,Google,Google,CyberCoders,CyberCoders,HP,Sigmaways,Global
 Data Consultancy,Global Data
 Consultancy,CyberCoders,CyberCoders,CyberCoders,VMware,CD IT
 Recruitment,CD IT Recruitment,Digital Reasoning Systems,AOL]
 [Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Software
 Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Software
 Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Data
 Architect,Systems Engineer, Site Reliability Engineering,Systems
 Engineer, Site Reliability Engineering,NoSQL Engineer - MongoDB for big
 data, web crawling - RELO OFFER,NoSQL Engineer - MongoDB for big data,
 web crawling - RELO OFFER,Hadoop Database Administrator Medicare,Hadoop
 / Big Data Consultant,Lead Hadoop developer,Head of Big Data -
 Hadoop,Hadoop Engineer - Hadoop, Operations, Linux Admin, Java,
 Storage,Sr. Hadoop Administrator - Hadoop, MapReduce, HDFS,Sr. Hadoop
 Administrator - Hadoop, MapReduce, HDFS,Software Engineer - Big
 Data,Hadoop Team Lead Consultant - Global Leader in Big Data
 solutions,Hadoop Administrator Consultant - Global Leader in Big Data
 solutions,Java Developer,Sr.Software Engineer-Big Data-Hadoop]
 [Pittsburgh, PA,Pittsburgh, PA,Harrisburg, PA,Pittsburgh, PA
 (Shadyside area near Bakery Square),Pittsburgh, PA, USA,Pittsburgh,
 PA,Cleveland, OH,Akron, OH,Herndon, VA,Cupertino, CA,London,
 United Kingdom,London, United Kingdom,Mountain View, CA,san jose,
 CA,Santa Clara, CA,Palo Alto, CA,Home based - Live anywhere in the UK
 or Benelux,Home based - Live anywhere in the UK or Benelux,Herndon,
 VA,Dulles, VA]
 Time taken: 8.518 seconds

 All company names come into an array, all position titles into another
 array and all locationdescription into yet another array. I cannot map 1
 value to the other.

 The below query gives a decent output where individual columns can be
 somewhat mapped:

 hive select jobs.values[0].company.name, jobs.values[0].position.title,
 jobs.values[0].locationdescription from linkedin_jobsearch;

 CyberCoders Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica
 Analytics  Pittsburgh, PA
 Time taken: 8.543 seconds

 But if I want to get the whole list this does not work. I have tried
 setting Input and output formats and setting serde properties to map to
 columns, but the output is the same. I haven't tried LATERAL VIEW
 json_tuple as yet, I found it cryptic and I hope there is something simpler.

 I can think of writing a UDF which 

Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
its all there in the documentation under create table and it seems you
got everything right too except one little thing - in your second example
there for 'sample data loaded' - instead of '^B' change that to '|'  and
you should be good. That's the delimiter that separates your two array
elements - ie collections.

i guess the real question for me is when you say 'since there is no way to
use given delimiter | ' what did you mean by that?



On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote:

 Hi All,

 I have 2 questions about complex data types in nested composition.

 1  I did not find a way to provide delimiter information in DDL if one
 or more column has nested array/struct. In this case, default delimiter has
 to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive select * from example;
 OK
 1[{st1:1,st2:string1},{st1:2,st2:string2}]
 Time taken: 0.288 seconds

 2  For the same DDL given above, if we provide clause* collection items
 terminated by '|' *and still use default delimiters (since there is no
 way to use given delimiter '|') then the select query shows incorrect data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive select * from
 example;

 OK
 1[{st1:1,st2:string1\u00022}]
 Time taken: 0.141 seconds
 **
 Thanks  Regards.



Re: Question regarding nested complex data type

2013-06-20 Thread neha
Thanks a lot for your reply, Stephen.
To answer your question - I was not aware of the fact that we could use
delimiter (in my example, '|') for first level of nesting. I tried now and
it worked fine.

My next question - Is there any way to provide delimiter in DDL for second
level of nesting?
Thanks again!!

On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.com wrote:

 its all there in the documentation under create table and it seems you
 got everything right too except one little thing - in your second example
 there for 'sample data loaded' - instead of '^B' change that to '|'  and
 you should be good. That's the delimiter that separates your two array
 elements - ie collections.

 i guess the real question for me is when you say 'since there is no way to
 use given delimiter | ' what did you mean by that?



 On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote:

 Hi All,

 I have 2 questions about complex data types in nested composition.

 1  I did not find a way to provide delimiter information in DDL if one
 or more column has nested array/struct. In this case, default delimiter has
 to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive select * from example;
 OK
 1[{st1:1,st2:string1},{st1:2,st2:string2}]
 Time taken: 0.288 seconds

 2  For the same DDL given above, if we provide clause* collection
 items terminated by '|' *and still use default delimiters (since there
 is no way to use given delimiter '|') then the select query shows incorrect
 data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive select * from
 example;

 OK
 1[{st1:1,st2:string1\u00022}]
 Time taken: 0.141 seconds
 **
 Thanks  Regards.





Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
what would be interesting would be to run a little experiment and find out
what the default PATH is on your data nodes.  How much of a pain would it
be to run a little python script to print to stderr the value of the
environmental variable $PATH and $PWD (or the shell command 'pwd') ?

that's of course going through normal channels of add file.

the thing is given you're using a relative path hive/parse_qx.py  you
need to know what the current directory is when the process runs on the
data nodes.




On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and thus
 available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does *have
 correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the hive
 directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
 eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such 
 file or directory






Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
you only get three.  field separator, array elements separator (aka
collection delimiter), and map key/value separator (aka map key
delimiter).

when you  nest deeper then you gotta use the default '^D', '^E' etc for
each level.  At least that's been my experience which i've found has worked
successfully.


On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote:

 Thanks a lot for your reply, Stephen.
 To answer your question - I was not aware of the fact that we could use
 delimiter (in my example, '|') for first level of nesting. I tried now and
 it worked fine.

 My next question - Is there any way to provide delimiter in DDL for second
 level of nesting?
 Thanks again!!


 On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote:

 its all there in the documentation under create table and it seems you
 got everything right too except one little thing - in your second example
 there for 'sample data loaded' - instead of '^B' change that to '|'  and
 you should be good. That's the delimiter that separates your two array
 elements - ie collections.

 i guess the real question for me is when you say 'since there is no way
 to use given delimiter | ' what did you mean by that?



 On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote:

 Hi All,

 I have 2 questions about complex data types in nested composition.

 1  I did not find a way to provide delimiter information in DDL if one
 or more column has nested array/struct. In this case, default delimiter has
 to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive select * from example;
 OK
 1[{st1:1,st2:string1},{st1:2,st2:string2}]
 Time taken: 0.288 seconds

 2  For the same DDL given above, if we provide clause* collection
 items terminated by '|' *and still use default delimiters (since there
 is no way to use given delimiter '|') then the select query shows incorrect
 data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive select * from
 example;

 OK
 1[{st1:1,st2:string1\u00022}]
 Time taken: 0.141 seconds
 **
 Thanks  Regards.






Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
@Stephen:  given the  'relative' path for hive is from a local downloads
directory on each local tasktracker in the cluster,  it was my thought that
if the archive were actually being expanded then
somedir/somefileinthearchive  should work.  I will go ahead and test this
assumption.

In the meantime, is there any facility available in hive for making
archived files available to hive jobs?  archive or hadoop archive (har)
etc?


2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find out
 what the default PATH is on your data nodes.  How much of a pain would it
 be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and thus
 available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does *have
 correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such 
 file or directory







Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
i personally only know of adding a .jar file via add archive but my
experience there is very limited.  i believe if you 'add file' and the file
is a directory it'll recursively take everything underneath but i know of
nothing that inflates or un tars things on the remote end automatically.

i would 'add file' your python script and then within that untar your
tarball to get at your model data. its just the matter of figuring out the
path to that tarball that's kinda up in the air when its added as 'add
file'.  Yeah. local downlooads directory.  What's the literal path is
what i'd like to know. :)


On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote:


 @Stephen:  given the  'relative' path for hive is from a local downloads
 directory on each local tasktracker in the cluster,  it was my thought that
 if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does *have
 correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory








Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
thx for the tip on add file where file is directory. I will try that.


2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote:


 @Stephen:  given the  'relative' path for hive is from a local downloads
 directory on each local tasktracker in the cluster,  it was my thought that
 if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory









Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
yeah.  the archive isn't unpacked on the remote side. I think add archive
is mostly used for finding java packages since CLASSPATH will reference the
archive (and as such there is no need to expand it.)


On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote:

 thx for the tip on add file where file is directory. I will try that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local downloads
 directory on each local tasktracker in the cluster,  it was my thought that
 if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory










Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Stephen:  would you be willing to share an example of specifying a
directory as the  add file target?I have not seen this working

I have attempted to use it as follows:

*We will access a script within the hivetry directory located here:*
hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
-rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py

*Add the directory  to hive:*
hive add file /opt/am/ver/1.0/hive/hivetry;
Added resource: /opt/am/ver/1.0/hive/hivetry

*Attempt to run transform query using that script:*
*
*
*Attempt one: use the script name unqualified:*

hivefrom (select transform (aappname,qappname) using
'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx )
o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
classifier_wf.py: java.io.IOException: error=2, No such file or
directory)


*Attempt two: use the script name with the directory name prefix: *
hivefrom (select transform (aappname,qappname) using
'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
hive/classifier_wf.py: java.io.IOException: error=2, No such file or
directory)




2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add archive
 is mostly used for finding java packages since CLASSPATH will reference the
 archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps 
 .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we 
 are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory











Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Ramki Palle
In the *Attempt two, *are you not supposed to use hivetry as the
directory?

May be you should try giving the full path 
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote:


 Stephen:  would you be willing to share an example of specifying a
 directory as the  add file target?I have not seen this working

 I have attempted to use it as follows:

 *We will access a script within the hivetry directory located here:*
 hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
 -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py

 *Add the directory  to hive:*
 hive add file /opt/am/ver/1.0/hive/hivetry;
 Added resource: /opt/am/ver/1.0/hive/hivetry

 *Attempt to run transform query using that script:*
 *
 *
 *Attempt one: use the script name unqualified:*

 hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' 
 as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c 
 select o.aappname2, o.qappname2;

 (Failed:   Caused by: java.io.IOException: Cannot run program 
 classifier_wf.py: java.io.IOException: error=2, No such file or directory)


 *Attempt two: use the script name with the directory name prefix: *

 hivefrom (select transform (aappname,qappname) using 
 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
 insert overwrite table c select o.aappname2, o.qappname2;

 (Failed:   Caused by: java.io.IOException: Cannot run program 
 hive/classifier_wf.py: java.io.IOException: error=2, No such file or 
 directory)




 2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add archive
 is mostly used for finding java packages since CLASSPATH will reference the
 archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and
 find out what the default PATH is on your data nodes.  How much of a pain
 would it be to run a little python script to print to stderr the value of
 the environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation 
 steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we 
 are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select 

Hive External Table issue

2013-06-20 Thread sanjeev sagar
Hello Everyone, I'm running into the following Hive external table issue.



hive CREATE EXTERNAL TABLE access(

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING,

referer STRING,

agent STRING)

ROW FORMAT SERDE

'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

   input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
\]*|\[^\]*\))?,

output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

%7$s %8$s %9$s

)

STORED AS TEXTFILE

LOCATION

'/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

FAILED: Error in metadata:

MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask





In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



I've download the serde2 jar file too and install it in
/usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
services after that.



I even added the jar file manually in hive and run the above sql but still
failing.

ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

  ;

Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



Any help would be highly appreciable.



-Sanjeev









-- 
Sanjeev Sagar

***Separate yourself from everything that separates you from others
! - Nirankari
Baba Hardev Singh ji *

**


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Good eyes Ramki!  thanks this directory in place of filename appears to
be working.  The script is getting loaded now using the Attempt two i.e.
 the hivetry/classification_wf.py as the script path.

thanks again.

stephenb


2013/6/20 Ramki Palle ramki.pa...@gmail.com

 In the *Attempt two, *are you not supposed to use hivetry as the
 directory?

 May be you should try giving the full path 
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works.

 Regards,
 Ramki.


 On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote:


 Stephen:  would you be willing to share an example of specifying a
 directory as the  add file target?I have not seen this working

 I have attempted to use it as follows:

 *We will access a script within the hivetry directory located here:*
 hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
 -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py

 *Add the directory  to hive:*
 hive add file /opt/am/ver/1.0/hive/hivetry;
 Added resource: /opt/am/ver/1.0/hive/hivetry

 *Attempt to run transform query using that script:*
 *
 *
 *Attempt one: use the script name unqualified:*

 hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' 
 as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table 
 c select o.aappname2, o.qappname2;


 (Failed:   Caused by: java.io.IOException: Cannot run program 
 classifier_wf.py: java.io.IOException: error=2, No such file or directory)


 *Attempt two: use the script name with the directory name prefix: *

 hivefrom (select transform (aappname,qappname) using 
 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
 insert overwrite table c select o.aappname2, o.qappname2;


 (Failed:   Caused by: java.io.IOException: Cannot run program 
 hive/classifier_wf.py: java.io.IOException: error=2, No such file or 
 directory)





 2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add
 archive is mostly used for finding java packages since CLASSPATH will
 reference the archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the 
 file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and
 find out what the default PATH is on your data nodes.  How much of a 
 pain
 would it be to run a little python script to print to stderr the value 
 of
 the environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs 
 on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch 
 java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation 
 steps .

 It seems the add archive  does not make the entries unarchived
 and thus available directly on the default file path - and that is 
 what we
 are looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored 

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)


it clearly says its not a directory. Point to the dictory and it will work


On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote:

 Hello Everyone, I'm running into the following Hive external table issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but still
 failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others ! - 
 Nirankari
 Baba Hardev Singh ji *

 **




-- 
Nitin Pawar


Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
I did mention in my mail the hdfs file exists in that location. See below

In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

so the directory and file both exists.


On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 Hello Everyone, I'm running into the following Hive external table issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




-- 
Sanjeev Sagar

***Separate yourself from everything that separates you from others
! - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
in hive when you create table and use the location to refer hdfs path, that
path is supposed to be a directory.
If the directory is not existing it will try to create it and if its a file
it will throw an error as its not a directory

thats the error you are getting that location you referred is a file.
Change it to the directory and see if that works for you


On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote:

 I did mention in my mail the hdfs file exists in that location. See below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 Hello Everyone, I'm running into the following Hive external table issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others ! - 
 Nirankari
 Baba Hardev Singh ji *

 **




-- 
Nitin Pawar


Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
Two issues:

1. I've created external tables in hive based on file location before and
it work without any issue. It don't have to be a directory.

2. If there are more than one file in the directory, and you create
external table based on directory then how the table knows that which file
it need to look for the data?

I tried to create the table based on directory, it created the table but
all the rows were NULL.

-Sanjeev


On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 I did mention in my mail the hdfs file exists in that location. See below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com
  wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




-- 
Sanjeev Sagar

***Separate yourself from everything that separates you from others
! - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Mark has answered this before
http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

If this link does not answer your question, do let us know


On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before and
 it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 I did mention in my mail the hdfs file exists in that location. See below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others
 ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others ! - 
 Nirankari
 Baba Hardev Singh ji *

 **




-- 
Nitin Pawar


Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
1. I was under the impression that you cannot refer the table location to a
file. But, it looks like it works. Please see the discussion in the thread
 http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/%
3c556325346ca26341b6f0530e07f90d96017084360...@gbgh-exch-cms.sig.ads%3e

2. It there are more than one file in the directory, your query gets the
data from all the files in that directory.

In your case, the regex may not be parsing the data properly.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 11:03 AM, sanjeev sagar sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before and
 it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 I did mention in my mail the hdfs file exists in that location. See below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^
 \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others
 ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others ! - 
 Nirankari
 Baba Hardev Singh ji *

 **



Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Also see this JIRA
https://issues.apache.org/jira/browse/HIVE-951

I think issue you are facing is due to the JIRA


On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Mark has answered this before

 http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

 If this link does not answer your question, do let us know


 On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before and
 it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.com
  wrote:

 I did mention in my mail the hdfs file exists in that location. See
 below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*)
 (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\)
 ([^ \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others
 ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from others !- 
 Nirankari
 Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




-- 
Nitin Pawar


unsubscribe

2013-06-20 Thread Neerja Bhatnagar



Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
Nitin,

Can you go through the thread with subject S3/EMR Hive: Load contents of a
single file  on Tue, 26 Mar, 17:11 at


http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1

 This gives the whole discussion about the topic of table location pointing
to a filename vs. directory.

Can you give your insight from this discussion and the discussion you
mentioned at stackoverflow link?

Regards,
Ramki.



On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Also see this JIRA
 https://issues.apache.org/jira/browse/HIVE-951

 I think issue you are facing is due to the JIRA


 On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Mark has answered this before

 http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

 If this link does not answer your question, do let us know


 On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
 sanjeev.sa...@gmail.comwrote:

 Two issues:

 1. I've created external tables in hive based on file location before
 and it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table but
 all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 I did mention in my mail the hdfs file exists in that location. See
 below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.com
  wrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*)
 (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\)
 ([^ \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the 
 hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

   ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 ***Separate yourself from everything that separates you from
 others ! - Nirankari Baba Hardev Singh ji *

 **




 --
 Nitin Pawar




 --
 Sanjeev 

Re: Hive External Table issue

2013-06-20 Thread Stephen Sprague
i agree.

conclusion: unless you're some kind of hive guru use a directory location
and get that to work before trying to get clever with file locations -
especially when you see an error message about not a directory and unable
to create it :)   Walk before you run good people.


On Thu, Jun 20, 2013 at 11:55 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Ramki,

 I was going through that thread before as Sanjeev said it worked so I was
 doing some experiment as well.
 As you I too had the impression that Hive tables are associated with
 directories and as pointed out I was wrong.

 Basically the idea of pointing a table to a file as mentioned on that
 thread is kind of hack
 create table without location
 alter table to point to file

 From Mark's answer what he suggest is we can use virtual column
 INPUT__FILE__NAME to select which file we want to use while querying in
 case there are multiple files inside a directory and you just want to use a
 specific one.

 The bug I mentioned is for  files, having particular files from a
 directory matching the regex. Not for the regex serde.

 Correct my understanding if I got anything wrong




 On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle ramki.pa...@gmail.comwrote:

 Nitin,

 Can you go through the thread with subject S3/EMR Hive: Load contents
 of a single file  on Tue, 26 Mar, 17:11 at


 http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1

  This gives the whole discussion about the topic of table location
 pointing to a filename vs. directory.

 Can you give your insight from this discussion and the discussion you
 mentioned at stackoverflow link?

 Regards,
 Ramki.



 On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Also see this JIRA
 https://issues.apache.org/jira/browse/HIVE-951

 I think issue you are facing is due to the JIRA


 On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar 
 nitinpawar...@gmail.comwrote:

 Mark has answered this before

 http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

 If this link does not answer your question, do let us know


 On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Two issues:

 1. I've created external tables in hive based on file location before
 and it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table
 but all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.com
  wrote:

 in hive when you create table and use the location to refer hdfs
 path, that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its
 a file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 I did mention in my mail the hdfs file exists in that location. See
 below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 nitinpawar...@gmail.com wrote:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it
 will work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
 sanjeev.sa...@gmail.com wrote:

 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive CREATE EXTERNAL TABLE access(

 host STRING,

 identity STRING,

 user STRING,

 time STRING,

 request STRING,

 status STRING,

 size STRING,

 referer STRING,

 agent STRING)

 ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 WITH SERDEPROPERTIES (

input.regex = ([^ ]*) ([^ ]*) ([^ ]*)
 (-|\\[[^\\]]*\\])

 ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\)
 ([^ \]*|\[^\]*\))?,

 output.format.string = %1$s %2$s %3$s %4$s %5$s
 %6$s

 %7$s %8$s %9$s

 )

 STORED AS TEXTFILE

 LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 

Re: show table throwing strange error

2013-06-20 Thread Sanjay Subramanian
Can u try from your ubuntu command prompt
$ hive -e show tables

From: Mohammad Tariq donta...@gmail.commailto:donta...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, June 20, 2013 4:28 AM
To: user user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: show table throwing strange error

Thank you for the response ma'am. It didn't help either.

Warm Regards,
Tariq
cloudfront.blogspot.comhttp://cloudfront.blogspot.com


On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind 
sunitarv...@gmail.commailto:sunitarv...@gmail.com wrote:
Your issue seems familiar. Try logging out of hive session and re-login.

Sunita


On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq 
donta...@gmail.commailto:donta...@gmail.com wrote:
Hello list,

 I have a hive(0.9.0) setup on my Ubuntu box running hadoop-1.0.4. 
Everything was going smooth till now. But today when I issued show tables I got 
some strange error on the CLI. Here is the error :

hive show tables;
FAILED: Parse Error: line 1:0 character '' not supported here
line 1:1 character '' not supported here
line 1:2 character '' not supported here
line 1:3 character '' not supported here
line 1:4 character '' not supported here
line 1:5 character '' not supported here
line 1:6 character '' not supported here
line 1:7 character '' not supported here
line 1:8 character '' not supported here
line 1:9 character '' not supported here
line 1:10 character '' not supported here
line 1:11 character '' not supported here
line 1:12 character '' not supported here
line 1:13 character '' not supported here
line 1:14 character '' not supported here
line 1:15 character '' not supported here
line 1:16 character '' not supported here
line 1:17 character '' not supported here
line 1:18 character '' not supported here
line 1:19 character '' not supported here
line 1:20 character '' not supported here
line 1:21 character '' not supported here
line 1:22 character '' not supported here
line 1:23 character '' not supported here
line 1:24 character '' not supported here
line 1:25 character '' not supported here
line 1:26 character '' not supported here
line 1:27 character '' not supported here
line 1:28 character '' not supported here
line 1:29 character '' not supported here
line 1:30 character '' not supported here
line 1:31 character '' not supported here
line 1:32 character '' not supported here
line 1:33 character '' not supported here
line 1:34 character '' not supported here
line 1:35 character '' not supported here
line 1:36 character '' not supported here
line 1:37 character '' not supported here
line 1:38 character '' not supported here
line 1:39 character '' not supported here
line 1:40 character '' not supported here
line 1:41 character '' not supported here
line 1:42 character '' not supported here
line 1:43 character '' not supported here
line 1:44 character '' not supported here
line 1:45 character '' not supported here
line 1:46 character '' not supported here
line 1:47 character '' not supported here
line 1:48 character '' not supported here
line 1:49 character '' not supported here
line 1:50 character '' not supported here
line 1:51 character '' not supported here
line 1:52 character '' not supported here
line 1:53 character '' not supported here
line 1:54 character '' not supported here
line 1:55 character '' not supported here
line 1:56 character '' not supported here
line 1:57 character '' not supported here
line 1:58 character '' not supported here
line 1:59 character '' not supported here
line 1:60 character '' not supported here
line 1:61 character '' not supported here
line 1:62 character '' not supported here
line 1:63 character '' not supported here
line 1:64 character '' not supported here
line 1:65 character '' not supported here
line 1:66 character '' not supported here
line 1:67 character '' not supported here
line 1:68 character '' not supported here
line 1:69 character '' not supported here
line 1:70 character '' not supported here
line 1:71 character '' not supported here
line 1:72 character '' not supported here
line 1:73 character '' not supported here
line 1:74 character '' not supported here
line 1:75 character '' not supported here
line 1:76 character '' not supported here
line 1:77 character '' not supported here
line 1:78 character '' not supported here
line 1:79 character '' not supported here
.
.
.
.
.
.
line 1:378 character '' not supported here
line 1:379 character '' not supported here
line 1:380 character '' not supported here
line 1:381 character '' not supported here

Strangely other queries like select foo from pokes where bar = 'tariq'; are 
working fine. Tried to search over the net but could not find anything 
useful.Need some help.

Thank you so much for your time.

Warm Regards,
Tariq
cloudfront.blogspot.comhttp://cloudfront.blogspot.com



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of 

Run queries from external files as subqueries

2013-06-20 Thread Sha Liu
Hi,
While working on some complex queries with multiple level of subqueries, I'm 
wonder if it is possible in Hive to refactor these subqueries into different 
files and instruct the enclosing query to execute these files. This way these 
subqueries can potentially be reused by other questions or just run by 
themselves.
Thanks,Sha Liu

Re: Run queries from external files as subqueries

2013-06-20 Thread Bertrand Dechoux
I am afraid that there is no automatic way of doing so. But that would be
the same answer whether the question is about hive or any relational
database.
(I would be glad to have counter examples.)

You might want to look at oozie in order to manage worflow. But the
creation of the worflow is manual indeed.
http://oozie.apache.org/

Regards

Bertrand




On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu lius...@hotmail.com wrote:

 Hi,

 While working on some complex queries with multiple level of subqueries,
 I'm wonder if it is possible in Hive to refactor these subqueries into
 different files and instruct the enclosing query to execute these files.
 This way these subqueries can potentially be reused by other questions or
 just run by themselves.

 Thanks,
 Sha Liu




-- 
Bertrand Dechoux


Re: Run queries from external files as subqueries

2013-06-20 Thread Jan Dolinár
Quick and dirty way to do such thing would be to use some kind of
preprocessor. To avoid writing one, you could use e.g. the one from GCC,
with just a little help from sed:

gcc -E -x c query.hql -o- | sed '/#/d'  preprocessed.hql
hive -f preprocessed.hql

Where query.hql can contain for example something like

SELECT * FROM (
#include subquery.hql
) t
WHERE id = 1;

The includes can be nested and multiplied as much as necessary. As a bonus,
you could also use #define for repeated parts of code and/or #ifdef to
build different queries based on parameters parameters passed to gcc ;-)

Best regards,
Jan Dolinar


On Thu, Jun 20, 2013 at 10:09 PM, Bertrand Dechoux decho...@gmail.comwrote:

 I am afraid that there is no automatic way of doing so. But that would be
 the same answer whether the question is about hive or any relational
 database.
 (I would be glad to have counter examples.)

 You might want to look at oozie in order to manage worflow. But the
 creation of the worflow is manual indeed.
 http://oozie.apache.org/

 Regards

 Bertrand




 On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu lius...@hotmail.com wrote:

 Hi,

 While working on some complex queries with multiple level of subqueries,
 I'm wonder if it is possible in Hive to refactor these subqueries into
 different files and instruct the enclosing query to execute these files.
 This way these subqueries can potentially be reused by other questions or
 just run by themselves.

 Thanks,
 Sha Liu




 --
 Bertrand Dechoux



Re: INSERT non-static data to array?

2013-06-20 Thread Michael Malak
I've created
https://issues.apache.org/jira/browse/HIVE-4771

to track this issue.


- Original Message -
From: Michael Malak michaelma...@yahoo.com
To: user@hive.apache.org user@hive.apache.org
Cc: 
Sent: Wednesday, June 19, 2013 2:35 PM
Subject: Re: INSERT non-static data to array?

The example code for inline_table() there has static data.  It's not possible 
to use a subquery inside the inline_table() or array() is it?

The SQL1999 way is described here:

http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org


CREATE TABLE table_a(a int, b int, c int[]);

INSERT INTO table_a
  SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id)
  FROM table_b


From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Michael Malak 
michaelma...@yahoo.com 
Sent: Wednesday, June 19, 2013 2:06 PM
Subject: Re: INSERT non-static data to array?



: https://issues.apache.org/jira/browse/HIVE-3238


This might fit the bill.




On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak michaelma...@yahoo.com wrote:

Is the only way to INSERT data into a column of type array to load data from 
a pre-existing file, to use hard-coded values in the INSERT statement, or copy 
an entire array verbatim from another table?  I.e. I'm assuming that a) SQL1999 
array INSERT via subquery is not (yet) implemented in Hive, and b) there is 
also no other way to load dynamically generated data into an array column?  
If my assumption in a) is true, does a Jira item need to be created for it?



Re: INSERT non-static data to array?

2013-06-20 Thread Michael Malak
My understanding is that LATERAL VIEW goes the other direction: takes an array 
and makes it into separate rows.  I use that a lot.  But I also need to go the 
other way sometimes: take a bunch of rows and squeeze them down into an array.  
Please correct me if I'm missing something.
 


 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Michael Malak 
michaelma...@yahoo.com 
Sent: Thursday, June 20, 2013 9:15 PM
Subject: Re: INSERT non-static data to array?
  


i think you could select into as sub query and then use lateral view.not 
exactly the same but something similar could be done,.

On Thursday, June 20, 2013, Michael Malak michaelma...@yahoo.com wrote:
 I've created
 https://issues.apache.org/jira/browse/HIVE-4771

 to track this issue.


 - Original Message -
 From: Michael Malak michaelma...@yahoo.com
 To: user@hive.apache.org user@hive.apache.org
 Cc:
 Sent: Wednesday, June 19, 2013 2:35 PM
 Subject: Re: INSERT non-static data to array?

 The example code for inline_table() there has static data.  It's not possible 
 to use a subquery inside the inline_table() or array() is it?

 The SQL1999 way is described here:

 http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org


 CREATE TABLE table_a(a int, b int, c int[]);

 INSERT INTO table_a
   SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id)
   FROM table_b

 
 From: Edward Capriolo edlinuxg...@gmail.com
 To: user@hive.apache.org user@hive.apache.org; Michael Malak 
 michaelma...@yahoo.com
 Sent: Wednesday, June 19, 2013 2:06 PM
 Subject: Re: INSERT non-static data to array?



 : https://issues.apache.org/jira/browse/HIVE-3238


 This might fit the bill.




 On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak michaelma...@yahoo.com wrote:

 Is the only way to INSERT data into a column of type array to load data 
 from a pre-existing file, to use hard-coded values in the INSERT statement, 
 or copy an entire array verbatim from another table?  I.e. I'm assuming that 
 a) SQL1999 array INSERT via subquery is not (yet) implemented in Hive, and b) 
 there is also no other way to load dynamically generated data into an array 
 column?  If my assumption in a) is true, does a Jira item need to be created 
 for it?



Re: Question regarding nested complex data type

2013-06-20 Thread Dean Wampler
It's not as simple as it seems, as I discovered yesterday, to my
surprise. I created a table like this:

CREATE TABLE t (
  name STRING,
  stuff   ARRAYSTRUCTfoo:String, bar:INT);

I then used an insert statement to see how Hive would store the records, so
I could populate the real table with another process. Hive used ^A for the
field separator, ^B for the collection separator, in this case, to separate
structs in the array, and ^C to separate the elements in each struct, e.g.,:

Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3

In other words, the structure you would expect for this table:

CREATE TABLE t (
  name STRING,
  stuff   MAPString, INT);

We should have covered the permutations of nested structures in our book,
but we didn't It would be great to document them, for realz some where.

dean

On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague sprag...@gmail.com wrote:

 you only get three.  field separator, array elements separator (aka
 collection delimiter), and map key/value separator (aka map key
 delimiter).

 when you  nest deeper then you gotta use the default '^D', '^E' etc for
 each level.  At least that's been my experience which i've found has worked
 successfully.


 On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote:

 Thanks a lot for your reply, Stephen.
 To answer your question - I was not aware of the fact that we could use
 delimiter (in my example, '|') for first level of nesting. I tried now and
 it worked fine.

 My next question - Is there any way to provide delimiter in DDL for
 second level of nesting?
 Thanks again!!


 On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote:

 its all there in the documentation under create table and it seems you
 got everything right too except one little thing - in your second example
 there for 'sample data loaded' - instead of '^B' change that to '|'  and
 you should be good. That's the delimiter that separates your two array
 elements - ie collections.

 i guess the real question for me is when you say 'since there is no way
 to use given delimiter | ' what did you mean by that?



 On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote:

 Hi All,

 I have 2 questions about complex data types in nested composition.

 1  I did not find a way to provide delimiter information in DDL if
 one or more column has nested array/struct. In this case, default delimiter
 has to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive select * from example;
 OK
 1[{st1:1,st2:string1},{st1:2,st2:string2}]
 Time taken: 0.288 seconds

 2  For the same DDL given above, if we provide clause* collection
 items terminated by '|' *and still use default delimiters (since there
 is no way to use given delimiter '|') then the select query shows incorrect
 data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive select * from
 example;

 OK
 1[{st1:1,st2:string1\u00022}]
 Time taken: 0.141 seconds
 **
 Thanks  Regards.







-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
look at it the other around if you want.  knowing an array of a two element
struct is topologically the same as a map - they  darn well better be the
same. :)



On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler deanwamp...@gmail.com wrote:

 It's not as simple as it seems, as I discovered yesterday, to my
 surprise. I created a table like this:

 CREATE TABLE t (
   name STRING,
   stuff   ARRAYSTRUCTfoo:String, bar:INT);

 I then used an insert statement to see how Hive would store the records,
 so I could populate the real table with another process. Hive used ^A for
 the field separator, ^B for the collection separator, in this case, to
 separate structs in the array, and ^C to separate the elements in each
 struct, e.g.,:

 Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3

 In other words, the structure you would expect for this table:

 CREATE TABLE t (
   name STRING,
   stuff   MAPString, INT);

 We should have covered the permutations of nested structures in our book,
 but we didn't It would be great to document them, for realz some where.

 dean

 On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague sprag...@gmail.comwrote:

 you only get three.  field separator, array elements separator (aka
 collection delimiter), and map key/value separator (aka map key
 delimiter).

 when you  nest deeper then you gotta use the default '^D', '^E' etc for
 each level.  At least that's been my experience which i've found has worked
 successfully.


 On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote:

 Thanks a lot for your reply, Stephen.
 To answer your question - I was not aware of the fact that we could use
 delimiter (in my example, '|') for first level of nesting. I tried now and
 it worked fine.

 My next question - Is there any way to provide delimiter in DDL for
 second level of nesting?
 Thanks again!!


 On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote:

 its all there in the documentation under create table and it seems
 you got everything right too except one little thing - in your second
 example there for 'sample data loaded' - instead of '^B' change that to
 '|'  and you should be good. That's the delimiter that separates your two
 array elements - ie collections.

 i guess the real question for me is when you say 'since there is no way
 to use given delimiter | ' what did you mean by that?



 On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote:

 Hi All,

 I have 2 questions about complex data types in nested composition.

 1  I did not find a way to provide delimiter information in DDL if
 one or more column has nested array/struct. In this case, default 
 delimiter
 has to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive select * from example;
 OK
 1[{st1:1,st2:string1},{st1:2,st2:string2}]
 Time taken: 0.288 seconds

 2  For the same DDL given above, if we provide clause* collection
 items terminated by '|' *and still use default delimiters (since
 there is no way to use given delimiter '|') then the select query shows
 incorrect data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive create table example(col1 int, col2
 arraystructst1:int,st2:string) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive select * from
 example;

 OK
 1[{st1:1,st2:string1\u00022}]
 Time taken: 0.141 seconds
 **
 Thanks  Regards.







 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com