Question regarding nested complex data type
Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards.
Re: show table throwing strange error
Thank you for the response ma'am. It didn't help either. Warm Regards, Tariq cloudfront.blogspot.com On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind sunitarv...@gmail.comwrote: Your issue seems familiar. Try logging out of hive session and re-login. Sunita On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq donta...@gmail.comwrote: Hello list, I have a hive(0.9.0) setup on my Ubuntu box running hadoop-1.0.4. Everything was going smooth till now. But today when I issued *show tables* I got some strange error on the CLI. Here is the error : hive show tables; FAILED: Parse Error: line 1:0 character '' not supported here line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here line 1:10 character '' not supported here line 1:11 character '' not supported here line 1:12 character '' not supported here line 1:13 character '' not supported here line 1:14 character '' not supported here line 1:15 character '' not supported here line 1:16 character '' not supported here line 1:17 character '' not supported here line 1:18 character '' not supported here line 1:19 character '' not supported here line 1:20 character '' not supported here line 1:21 character '' not supported here line 1:22 character '' not supported here line 1:23 character '' not supported here line 1:24 character '' not supported here line 1:25 character '' not supported here line 1:26 character '' not supported here line 1:27 character '' not supported here line 1:28 character '' not supported here line 1:29 character '' not supported here line 1:30 character '' not supported here line 1:31 character '' not supported here line 1:32 character '' not supported here line 1:33 character '' not supported here line 1:34 character '' not supported here line 1:35 character '' not supported here line 1:36 character '' not supported here line 1:37 character '' not supported here line 1:38 character '' not supported here line 1:39 character '' not supported here line 1:40 character '' not supported here line 1:41 character '' not supported here line 1:42 character '' not supported here line 1:43 character '' not supported here line 1:44 character '' not supported here line 1:45 character '' not supported here line 1:46 character '' not supported here line 1:47 character '' not supported here line 1:48 character '' not supported here line 1:49 character '' not supported here line 1:50 character '' not supported here line 1:51 character '' not supported here line 1:52 character '' not supported here line 1:53 character '' not supported here line 1:54 character '' not supported here line 1:55 character '' not supported here line 1:56 character '' not supported here line 1:57 character '' not supported here line 1:58 character '' not supported here line 1:59 character '' not supported here line 1:60 character '' not supported here line 1:61 character '' not supported here line 1:62 character '' not supported here line 1:63 character '' not supported here line 1:64 character '' not supported here line 1:65 character '' not supported here line 1:66 character '' not supported here line 1:67 character '' not supported here line 1:68 character '' not supported here line 1:69 character '' not supported here line 1:70 character '' not supported here line 1:71 character '' not supported here line 1:72 character '' not supported here line 1:73 character '' not supported here line 1:74 character '' not supported here line 1:75 character '' not supported here line 1:76 character '' not supported here line 1:77 character '' not supported here line 1:78 character '' not supported here line 1:79 character '' not supported here . . . . . . line 1:378 character '' not supported here line 1:379 character '' not supported here line 1:380 character '' not supported here line 1:381 character '' not supported here Strangely other queries like *select foo from pokes where bar = 'tariq';*are working fine. Tried to search over the net but could not find anything useful.Need some help. Thank you so much for your time. Warm Regards, Tariq cloudfront.blogspot.com
Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Hive select shows null after successful data load
hooray! over one hurdle and onto the next one. So something about that one nested array caused the problem. very strange. I wonder if there is a smaller test case to look at as it seems not all arrays break it since i see one for the attribute values. As to the formatting issue i don't believe the native hive client has much to offer there. its bare bones and record oriented. beeline seems to another opensource hive client which looks to have more options you might have a gander at that though i don't think it has anything special for pretty printing arrays, maps or structs but i could be wrong. And then of course nothing stopping you though from exploring piping that gnarly stuff into python (or whatever) and have it come out the other end all nice and pretty -- and then posting that here. :) On Wed, Jun 19, 2013 at 7:54 PM, Sunita Arvind sunitarv...@gmail.comwrote: Finally I could get it work. The issue resolves once I remove the arrays within position structure. So that is the limitation of the serde. I changed 'industries' to string and 'jobfunctions' to Mapstring,string I can query the table just fine now. Here is the complete DDL for reference: create external table linkedin_Jobsearch ( jobs STRUCT values : ARRAYSTRUCT company : STRUCT id : STRING, name : STRING, postingDate : STRUCT year : STRING, day : STRING, month : STRING, descriptionSnippet : STRING, expirationDate : STRUCT year : STRING, day : STRING, month : STRING, position : STRUCT jobFunctions : MAPSTRING,STRING,--these were arrays of structure in my previous attempts industries : STRING, title : STRING, jobType : STRUCT code : STRING, name : STRING, experienceLevel : STRUCT code : STRING, name : STRING, id : STRING, customerJobCode : STRING, skillsAndExperience : STRING, salary : STRING, jobPoster : STRUCT id : STRING, firstName : STRING, lastName : STRING, headline : STRING, referralBonus : STRING, locationDescription : STRING ) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' LOCATION '/user/sunita/tables/jobs'; Thanks Stephen for sharing your thoughts. It helped. Also if someone /Stephen could help me display this information in a useful manner, that would be great. Right now all the values show up as arrays. Here is what I mean: For a query like this: hive select jobs.values.company.name, jobs.values.position.title, jobs.values.locationdescription from linkedin_jobsearch; This is the output: [CyberCoders,CyberCoders,CyberCoders,Management Science Associates,Google,Google,CyberCoders,CyberCoders,HP,Sigmaways,Global Data Consultancy,Global Data Consultancy,CyberCoders,CyberCoders,CyberCoders,VMware,CD IT Recruitment,CD IT Recruitment,Digital Reasoning Systems,AOL] [Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics,Data Architect,Systems Engineer, Site Reliability Engineering,Systems Engineer, Site Reliability Engineering,NoSQL Engineer - MongoDB for big data, web crawling - RELO OFFER,NoSQL Engineer - MongoDB for big data, web crawling - RELO OFFER,Hadoop Database Administrator Medicare,Hadoop / Big Data Consultant,Lead Hadoop developer,Head of Big Data - Hadoop,Hadoop Engineer - Hadoop, Operations, Linux Admin, Java, Storage,Sr. Hadoop Administrator - Hadoop, MapReduce, HDFS,Sr. Hadoop Administrator - Hadoop, MapReduce, HDFS,Software Engineer - Big Data,Hadoop Team Lead Consultant - Global Leader in Big Data solutions,Hadoop Administrator Consultant - Global Leader in Big Data solutions,Java Developer,Sr.Software Engineer-Big Data-Hadoop] [Pittsburgh, PA,Pittsburgh, PA,Harrisburg, PA,Pittsburgh, PA (Shadyside area near Bakery Square),Pittsburgh, PA, USA,Pittsburgh, PA,Cleveland, OH,Akron, OH,Herndon, VA,Cupertino, CA,London, United Kingdom,London, United Kingdom,Mountain View, CA,san jose, CA,Santa Clara, CA,Palo Alto, CA,Home based - Live anywhere in the UK or Benelux,Home based - Live anywhere in the UK or Benelux,Herndon, VA,Dulles, VA] Time taken: 8.518 seconds All company names come into an array, all position titles into another array and all locationdescription into yet another array. I cannot map 1 value to the other. The below query gives a decent output where individual columns can be somewhat mapped: hive select jobs.values[0].company.name, jobs.values[0].position.title, jobs.values[0].locationdescription from linkedin_jobsearch; CyberCoders Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics Pittsburgh, PA Time taken: 8.543 seconds But if I want to get the whole list this does not work. I have tried setting Input and output formats and setting serde properties to map to columns, but the output is the same. I haven't tried LATERAL VIEW json_tuple as yet, I found it cryptic and I hope there is something simpler. I can think of writing a UDF which
Re: Question regarding nested complex data type
its all there in the documentation under create table and it seems you got everything right too except one little thing - in your second example there for 'sample data loaded' - instead of '^B' change that to '|' and you should be good. That's the delimiter that separates your two array elements - ie collections. i guess the real question for me is when you say 'since there is no way to use given delimiter | ' what did you mean by that? On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote: Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards.
Re: Question regarding nested complex data type
Thanks a lot for your reply, Stephen. To answer your question - I was not aware of the fact that we could use delimiter (in my example, '|') for first level of nesting. I tried now and it worked fine. My next question - Is there any way to provide delimiter in DDL for second level of nesting? Thanks again!! On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.com wrote: its all there in the documentation under create table and it seems you got everything right too except one little thing - in your second example there for 'sample data loaded' - instead of '^B' change that to '|' and you should be good. That's the delimiter that separates your two array elements - ie collections. i guess the real question for me is when you say 'since there is no way to use given delimiter | ' what did you mean by that? On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote: Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards.
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Question regarding nested complex data type
you only get three. field separator, array elements separator (aka collection delimiter), and map key/value separator (aka map key delimiter). when you nest deeper then you gotta use the default '^D', '^E' etc for each level. At least that's been my experience which i've found has worked successfully. On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote: Thanks a lot for your reply, Stephen. To answer your question - I was not aware of the fact that we could use delimiter (in my example, '|') for first level of nesting. I tried now and it worked fine. My next question - Is there any way to provide delimiter in DDL for second level of nesting? Thanks again!! On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote: its all there in the documentation under create table and it seems you got everything right too except one little thing - in your second example there for 'sample data loaded' - instead of '^B' change that to '|' and you should be good. That's the delimiter that separates your two array elements - ie collections. i guess the real question for me is when you say 'since there is no way to use given delimiter | ' what did you mean by that? On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote: Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards.
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
@Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag...@gmail.com i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
yeah. the archive isn't unpacked on the remote side. I think add archive is mostly used for finding java packages since CLASSPATH will reference the archive (and as such there is no need to expand it.) On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote: thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag...@gmail.com i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
Stephen: would you be willing to share an example of specifying a directory as the add file target?I have not seen this working I have attempted to use it as follows: *We will access a script within the hivetry directory located here:* hive ! ls -l /opt/am/ver/1.0/hive/hivetry/classifier_wf.py; -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py *Add the directory to hive:* hive add file /opt/am/ver/1.0/hive/hivetry; Added resource: /opt/am/ver/1.0/hive/hivetry *Attempt to run transform query using that script:* * * *Attempt one: use the script name unqualified:* hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program classifier_wf.py: java.io.IOException: error=2, No such file or directory) *Attempt two: use the script name with the directory name prefix: * hivefrom (select transform (aappname,qappname) using 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program hive/classifier_wf.py: java.io.IOException: error=2, No such file or directory) 2013/6/20 Stephen Sprague sprag...@gmail.com yeah. the archive isn't unpacked on the remote side. I think add archive is mostly used for finding java packages since CLASSPATH will reference the archive (and as such there is no need to expand it.) On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote: thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag...@gmail.com i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such file or directory
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
In the *Attempt two, *are you not supposed to use hivetry as the directory? May be you should try giving the full path /opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works. Regards, Ramki. On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote: Stephen: would you be willing to share an example of specifying a directory as the add file target?I have not seen this working I have attempted to use it as follows: *We will access a script within the hivetry directory located here:* hive ! ls -l /opt/am/ver/1.0/hive/hivetry/classifier_wf.py; -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py *Add the directory to hive:* hive add file /opt/am/ver/1.0/hive/hivetry; Added resource: /opt/am/ver/1.0/hive/hivetry *Attempt to run transform query using that script:* * * *Attempt one: use the script name unqualified:* hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program classifier_wf.py: java.io.IOException: error=2, No such file or directory) *Attempt two: use the script name with the directory name prefix: * hivefrom (select transform (aappname,qappname) using 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program hive/classifier_wf.py: java.io.IOException: error=2, No such file or directory) 2013/6/20 Stephen Sprague sprag...@gmail.com yeah. the archive isn't unpacked on the remote side. I think add archive is mostly used for finding java packages since CLASSPATH will reference the archive (and as such there is no need to expand it.) On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote: thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag...@gmail.com i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select
Hive External Table issue
Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * **
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
Good eyes Ramki! thanks this directory in place of filename appears to be working. The script is getting loaded now using the Attempt two i.e. the hivetry/classification_wf.py as the script path. thanks again. stephenb 2013/6/20 Ramki Palle ramki.pa...@gmail.com In the *Attempt two, *are you not supposed to use hivetry as the directory? May be you should try giving the full path /opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works. Regards, Ramki. On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote: Stephen: would you be willing to share an example of specifying a directory as the add file target?I have not seen this working I have attempted to use it as follows: *We will access a script within the hivetry directory located here:* hive ! ls -l /opt/am/ver/1.0/hive/hivetry/classifier_wf.py; -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py *Add the directory to hive:* hive add file /opt/am/ver/1.0/hive/hivetry; Added resource: /opt/am/ver/1.0/hive/hivetry *Attempt to run transform query using that script:* * * *Attempt one: use the script name unqualified:* hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program classifier_wf.py: java.io.IOException: error=2, No such file or directory) *Attempt two: use the script name with the directory name prefix: * hivefrom (select transform (aappname,qappname) using 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; (Failed: Caused by: java.io.IOException: Cannot run program hive/classifier_wf.py: java.io.IOException: error=2, No such file or directory) 2013/6/20 Stephen Sprague sprag...@gmail.com yeah. the archive isn't unpacked on the remote side. I think add archive is mostly used for finding java packages since CLASSPATH will reference the archive (and as such there is no need to expand it.) On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote: thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag...@gmail.com i personally only know of adding a .jar file via add archive but my experience there is very limited. i believe if you 'add file' and the file is a directory it'll recursively take everything underneath but i know of nothing that inflates or un tars things on the remote end automatically. i would 'add file' your python script and then within that untar your tarball to get at your model data. its just the matter of figuring out the path to that tarball that's kinda up in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote: @Stephen: given the 'relative' path for hive is from a local downloads directory on each local tasktracker in the cluster, it was my thought that if the archive were actually being expanded then somedir/somefileinthearchive should work. I will go ahead and test this assumption. In the meantime, is there any facility available in hive for making archived files available to hive jobs? archive or hadoop archive (har) etc? 2013/6/20 Stephen Sprague sprag...@gmail.com what would be interesting would be to run a little experiment and find out what the default PATH is on your data nodes. How much of a pain would it be to run a little python script to print to stderr the value of the environmental variable $PATH and $PWD (or the shell command 'pwd') ? that's of course going through normal channels of add file. the thing is given you're using a relative path hive/parse_qx.py you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the hive directory stored
Re: Hive External Table issue
MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar
Re: Hive External Table issue
I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * **
Re: Hive External Table issue
in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar
Re: Hive External Table issue
Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * **
Re: Hive External Table issue
Mark has answered this before http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil If this link does not answer your question, do let us know On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar
Re: Hive External Table issue
1. I was under the impression that you cannot refer the table location to a file. But, it looks like it works. Please see the discussion in the thread http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/% 3c556325346ca26341b6f0530e07f90d96017084360...@gbgh-exch-cms.sig.ads%3e 2. It there are more than one file in the directory, your query gets the data from all the files in that directory. In your case, the regex may not be parsing the data properly. Regards, Ramki. On Thu, Jun 20, 2013 at 11:03 AM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * **
Re: Hive External Table issue
Also see this JIRA https://issues.apache.org/jira/browse/HIVE-951 I think issue you are facing is due to the JIRA On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote: Mark has answered this before http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil If this link does not answer your question, do let us know On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.comwrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev Sagar ***Separate yourself from everything that separates you from others !- Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Nitin Pawar
unsubscribe
Re: Hive External Table issue
Nitin, Can you go through the thread with subject S3/EMR Hive: Load contents of a single file on Tue, 26 Mar, 17:11 at http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1 This gives the whole discussion about the topic of table location pointing to a filename vs. directory. Can you give your insight from this discussion and the discussion you mentioned at stackoverflow link? Regards, Ramki. On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Also see this JIRA https://issues.apache.org/jira/browse/HIVE-951 I think issue you are facing is due to the JIRA On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote: Mark has answered this before http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil If this link does not answer your question, do let us know On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar sanjeev.sa...@gmail.comwrote: Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.com wrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 I've download the serde2 jar file too and install it in /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop services after that. I even added the jar file manually in hive and run the above sql but still failing. ive add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar ; Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar Any help would be highly appreciable. -Sanjeev -- Sanjeev Sagar ***Separate yourself from everything that separates you from others ! - Nirankari Baba Hardev Singh ji * ** -- Nitin Pawar -- Sanjeev
Re: Hive External Table issue
i agree. conclusion: unless you're some kind of hive guru use a directory location and get that to work before trying to get clever with file locations - especially when you see an error message about not a directory and unable to create it :) Walk before you run good people. On Thu, Jun 20, 2013 at 11:55 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Ramki, I was going through that thread before as Sanjeev said it worked so I was doing some experiment as well. As you I too had the impression that Hive tables are associated with directories and as pointed out I was wrong. Basically the idea of pointing a table to a file as mentioned on that thread is kind of hack create table without location alter table to point to file From Mark's answer what he suggest is we can use virtual column INPUT__FILE__NAME to select which file we want to use while querying in case there are multiple files inside a directory and you just want to use a specific one. The bug I mentioned is for files, having particular files from a directory matching the regex. Not for the regex serde. Correct my understanding if I got anything wrong On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle ramki.pa...@gmail.comwrote: Nitin, Can you go through the thread with subject S3/EMR Hive: Load contents of a single file on Tue, 26 Mar, 17:11 at http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1 This gives the whole discussion about the topic of table location pointing to a filename vs. directory. Can you give your insight from this discussion and the discussion you mentioned at stackoverflow link? Regards, Ramki. On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Also see this JIRA https://issues.apache.org/jira/browse/HIVE-951 I think issue you are facing is due to the JIRA On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar nitinpawar...@gmail.comwrote: Mark has answered this before http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil If this link does not answer your question, do let us know On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Two issues: 1. I've created external tables in hive based on file location before and it work without any issue. It don't have to be a directory. 2. If there are more than one file in the directory, and you create external table based on directory then how the table knows that which file it need to look for the data? I tried to create the table based on directory, it created the table but all the rows were NULL. -Sanjeev On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar nitinpawar...@gmail.com wrote: in hive when you create table and use the location to refer hdfs path, that path is supposed to be a directory. If the directory is not existing it will try to create it and if its a file it will throw an error as its not a directory thats the error you are getting that location you referred is a file. Change it to the directory and see if that works for you On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: I did mention in my mail the hdfs file exists in that location. See below In HDFS: file exists hadoop fs -ls /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 Found 1 items -rw-r--r-- 3 hdfs supergroup 2242037226 2013-06-13 11:14 /user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 so the directory and file both exists. On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar nitinpawar...@gmail.com wrote: MetaException(message:hdfs:// h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033 is not a directory or unable to create one) it clearly says its not a directory. Point to the dictory and it will work On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar sanjeev.sa...@gmail.com wrote: Hello Everyone, I'm running into the following Hive external table issue. hive CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( input.regex = ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \]*|\[^\]*\) (-|[0-9]*) (-|[0-9]*)(?: ([^ \]*|\[^\]*\) ([^ \]*|\[^\]*\))?, output.format.string = %1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s ) STORED AS TEXTFILE LOCATION '/user/flume/events/request_logs/ ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033'; FAILED: Error in metadata:
Re: show table throwing strange error
Can u try from your ubuntu command prompt $ hive -e show tables From: Mohammad Tariq donta...@gmail.commailto:donta...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, June 20, 2013 4:28 AM To: user user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: show table throwing strange error Thank you for the response ma'am. It didn't help either. Warm Regards, Tariq cloudfront.blogspot.comhttp://cloudfront.blogspot.com On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind sunitarv...@gmail.commailto:sunitarv...@gmail.com wrote: Your issue seems familiar. Try logging out of hive session and re-login. Sunita On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq donta...@gmail.commailto:donta...@gmail.com wrote: Hello list, I have a hive(0.9.0) setup on my Ubuntu box running hadoop-1.0.4. Everything was going smooth till now. But today when I issued show tables I got some strange error on the CLI. Here is the error : hive show tables; FAILED: Parse Error: line 1:0 character '' not supported here line 1:1 character '' not supported here line 1:2 character '' not supported here line 1:3 character '' not supported here line 1:4 character '' not supported here line 1:5 character '' not supported here line 1:6 character '' not supported here line 1:7 character '' not supported here line 1:8 character '' not supported here line 1:9 character '' not supported here line 1:10 character '' not supported here line 1:11 character '' not supported here line 1:12 character '' not supported here line 1:13 character '' not supported here line 1:14 character '' not supported here line 1:15 character '' not supported here line 1:16 character '' not supported here line 1:17 character '' not supported here line 1:18 character '' not supported here line 1:19 character '' not supported here line 1:20 character '' not supported here line 1:21 character '' not supported here line 1:22 character '' not supported here line 1:23 character '' not supported here line 1:24 character '' not supported here line 1:25 character '' not supported here line 1:26 character '' not supported here line 1:27 character '' not supported here line 1:28 character '' not supported here line 1:29 character '' not supported here line 1:30 character '' not supported here line 1:31 character '' not supported here line 1:32 character '' not supported here line 1:33 character '' not supported here line 1:34 character '' not supported here line 1:35 character '' not supported here line 1:36 character '' not supported here line 1:37 character '' not supported here line 1:38 character '' not supported here line 1:39 character '' not supported here line 1:40 character '' not supported here line 1:41 character '' not supported here line 1:42 character '' not supported here line 1:43 character '' not supported here line 1:44 character '' not supported here line 1:45 character '' not supported here line 1:46 character '' not supported here line 1:47 character '' not supported here line 1:48 character '' not supported here line 1:49 character '' not supported here line 1:50 character '' not supported here line 1:51 character '' not supported here line 1:52 character '' not supported here line 1:53 character '' not supported here line 1:54 character '' not supported here line 1:55 character '' not supported here line 1:56 character '' not supported here line 1:57 character '' not supported here line 1:58 character '' not supported here line 1:59 character '' not supported here line 1:60 character '' not supported here line 1:61 character '' not supported here line 1:62 character '' not supported here line 1:63 character '' not supported here line 1:64 character '' not supported here line 1:65 character '' not supported here line 1:66 character '' not supported here line 1:67 character '' not supported here line 1:68 character '' not supported here line 1:69 character '' not supported here line 1:70 character '' not supported here line 1:71 character '' not supported here line 1:72 character '' not supported here line 1:73 character '' not supported here line 1:74 character '' not supported here line 1:75 character '' not supported here line 1:76 character '' not supported here line 1:77 character '' not supported here line 1:78 character '' not supported here line 1:79 character '' not supported here . . . . . . line 1:378 character '' not supported here line 1:379 character '' not supported here line 1:380 character '' not supported here line 1:381 character '' not supported here Strangely other queries like select foo from pokes where bar = 'tariq'; are working fine. Tried to search over the net but could not find anything useful.Need some help. Thank you so much for your time. Warm Regards, Tariq cloudfront.blogspot.comhttp://cloudfront.blogspot.com CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of
Run queries from external files as subqueries
Hi, While working on some complex queries with multiple level of subqueries, I'm wonder if it is possible in Hive to refactor these subqueries into different files and instruct the enclosing query to execute these files. This way these subqueries can potentially be reused by other questions or just run by themselves. Thanks,Sha Liu
Re: Run queries from external files as subqueries
I am afraid that there is no automatic way of doing so. But that would be the same answer whether the question is about hive or any relational database. (I would be glad to have counter examples.) You might want to look at oozie in order to manage worflow. But the creation of the worflow is manual indeed. http://oozie.apache.org/ Regards Bertrand On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu lius...@hotmail.com wrote: Hi, While working on some complex queries with multiple level of subqueries, I'm wonder if it is possible in Hive to refactor these subqueries into different files and instruct the enclosing query to execute these files. This way these subqueries can potentially be reused by other questions or just run by themselves. Thanks, Sha Liu -- Bertrand Dechoux
Re: Run queries from external files as subqueries
Quick and dirty way to do such thing would be to use some kind of preprocessor. To avoid writing one, you could use e.g. the one from GCC, with just a little help from sed: gcc -E -x c query.hql -o- | sed '/#/d' preprocessed.hql hive -f preprocessed.hql Where query.hql can contain for example something like SELECT * FROM ( #include subquery.hql ) t WHERE id = 1; The includes can be nested and multiplied as much as necessary. As a bonus, you could also use #define for repeated parts of code and/or #ifdef to build different queries based on parameters parameters passed to gcc ;-) Best regards, Jan Dolinar On Thu, Jun 20, 2013 at 10:09 PM, Bertrand Dechoux decho...@gmail.comwrote: I am afraid that there is no automatic way of doing so. But that would be the same answer whether the question is about hive or any relational database. (I would be glad to have counter examples.) You might want to look at oozie in order to manage worflow. But the creation of the worflow is manual indeed. http://oozie.apache.org/ Regards Bertrand On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu lius...@hotmail.com wrote: Hi, While working on some complex queries with multiple level of subqueries, I'm wonder if it is possible in Hive to refactor these subqueries into different files and instruct the enclosing query to execute these files. This way these subqueries can potentially be reused by other questions or just run by themselves. Thanks, Sha Liu -- Bertrand Dechoux
Re: INSERT non-static data to array?
I've created https://issues.apache.org/jira/browse/HIVE-4771 to track this issue. - Original Message - From: Michael Malak michaelma...@yahoo.com To: user@hive.apache.org user@hive.apache.org Cc: Sent: Wednesday, June 19, 2013 2:35 PM Subject: Re: INSERT non-static data to array? The example code for inline_table() there has static data. It's not possible to use a subquery inside the inline_table() or array() is it? The SQL1999 way is described here: http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org CREATE TABLE table_a(a int, b int, c int[]); INSERT INTO table_a SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id) FROM table_b From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org; Michael Malak michaelma...@yahoo.com Sent: Wednesday, June 19, 2013 2:06 PM Subject: Re: INSERT non-static data to array? : https://issues.apache.org/jira/browse/HIVE-3238 This might fit the bill. On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak michaelma...@yahoo.com wrote: Is the only way to INSERT data into a column of type array to load data from a pre-existing file, to use hard-coded values in the INSERT statement, or copy an entire array verbatim from another table? I.e. I'm assuming that a) SQL1999 array INSERT via subquery is not (yet) implemented in Hive, and b) there is also no other way to load dynamically generated data into an array column? If my assumption in a) is true, does a Jira item need to be created for it?
Re: INSERT non-static data to array?
My understanding is that LATERAL VIEW goes the other direction: takes an array and makes it into separate rows. I use that a lot. But I also need to go the other way sometimes: take a bunch of rows and squeeze them down into an array. Please correct me if I'm missing something. From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org; Michael Malak michaelma...@yahoo.com Sent: Thursday, June 20, 2013 9:15 PM Subject: Re: INSERT non-static data to array? i think you could select into as sub query and then use lateral view.not exactly the same but something similar could be done,. On Thursday, June 20, 2013, Michael Malak michaelma...@yahoo.com wrote: I've created https://issues.apache.org/jira/browse/HIVE-4771 to track this issue. - Original Message - From: Michael Malak michaelma...@yahoo.com To: user@hive.apache.org user@hive.apache.org Cc: Sent: Wednesday, June 19, 2013 2:35 PM Subject: Re: INSERT non-static data to array? The example code for inline_table() there has static data. It's not possible to use a subquery inside the inline_table() or array() is it? The SQL1999 way is described here: http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org CREATE TABLE table_a(a int, b int, c int[]); INSERT INTO table_a SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id) FROM table_b From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org; Michael Malak michaelma...@yahoo.com Sent: Wednesday, June 19, 2013 2:06 PM Subject: Re: INSERT non-static data to array? : https://issues.apache.org/jira/browse/HIVE-3238 This might fit the bill. On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak michaelma...@yahoo.com wrote: Is the only way to INSERT data into a column of type array to load data from a pre-existing file, to use hard-coded values in the INSERT statement, or copy an entire array verbatim from another table? I.e. I'm assuming that a) SQL1999 array INSERT via subquery is not (yet) implemented in Hive, and b) there is also no other way to load dynamically generated data into an array column? If my assumption in a) is true, does a Jira item need to be created for it?
Re: Question regarding nested complex data type
It's not as simple as it seems, as I discovered yesterday, to my surprise. I created a table like this: CREATE TABLE t ( name STRING, stuff ARRAYSTRUCTfoo:String, bar:INT); I then used an insert statement to see how Hive would store the records, so I could populate the real table with another process. Hive used ^A for the field separator, ^B for the collection separator, in this case, to separate structs in the array, and ^C to separate the elements in each struct, e.g.,: Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3 In other words, the structure you would expect for this table: CREATE TABLE t ( name STRING, stuff MAPString, INT); We should have covered the permutations of nested structures in our book, but we didn't It would be great to document them, for realz some where. dean On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague sprag...@gmail.com wrote: you only get three. field separator, array elements separator (aka collection delimiter), and map key/value separator (aka map key delimiter). when you nest deeper then you gotta use the default '^D', '^E' etc for each level. At least that's been my experience which i've found has worked successfully. On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote: Thanks a lot for your reply, Stephen. To answer your question - I was not aware of the fact that we could use delimiter (in my example, '|') for first level of nesting. I tried now and it worked fine. My next question - Is there any way to provide delimiter in DDL for second level of nesting? Thanks again!! On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote: its all there in the documentation under create table and it seems you got everything right too except one little thing - in your second example there for 'sample data loaded' - instead of '^B' change that to '|' and you should be good. That's the delimiter that separates your two array elements - ie collections. i guess the real question for me is when you say 'since there is no way to use given delimiter | ' what did you mean by that? On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote: Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards. -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com
Re: Question regarding nested complex data type
look at it the other around if you want. knowing an array of a two element struct is topologically the same as a map - they darn well better be the same. :) On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler deanwamp...@gmail.com wrote: It's not as simple as it seems, as I discovered yesterday, to my surprise. I created a table like this: CREATE TABLE t ( name STRING, stuff ARRAYSTRUCTfoo:String, bar:INT); I then used an insert statement to see how Hive would store the records, so I could populate the real table with another process. Hive used ^A for the field separator, ^B for the collection separator, in this case, to separate structs in the array, and ^C to separate the elements in each struct, e.g.,: Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3 In other words, the structure you would expect for this table: CREATE TABLE t ( name STRING, stuff MAPString, INT); We should have covered the permutations of nested structures in our book, but we didn't It would be great to document them, for realz some where. dean On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague sprag...@gmail.comwrote: you only get three. field separator, array elements separator (aka collection delimiter), and map key/value separator (aka map key delimiter). when you nest deeper then you gotta use the default '^D', '^E' etc for each level. At least that's been my experience which i've found has worked successfully. On Thu, Jun 20, 2013 at 7:45 AM, neha ms.nehato...@gmail.com wrote: Thanks a lot for your reply, Stephen. To answer your question - I was not aware of the fact that we could use delimiter (in my example, '|') for first level of nesting. I tried now and it worked fine. My next question - Is there any way to provide delimiter in DDL for second level of nesting? Thanks again!! On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague sprag...@gmail.comwrote: its all there in the documentation under create table and it seems you got everything right too except one little thing - in your second example there for 'sample data loaded' - instead of '^B' change that to '|' and you should be good. That's the delimiter that separates your two array elements - ie collections. i guess the real question for me is when you say 'since there is no way to use given delimiter | ' what did you mean by that? On Thu, Jun 20, 2013 at 1:42 AM, neha ms.nehato...@gmail.com wrote: Hi All, I have 2 questions about complex data types in nested composition. 1 I did not find a way to provide delimiter information in DDL if one or more column has nested array/struct. In this case, default delimiter has to be used for complex type column. Please let me know if this is a limitation as of now or I am missing something. e.g.: *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ','; OK Time taken: 0.226 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P:* hive select * from example; OK 1[{st1:1,st2:string1},{st1:2,st2:string2}] Time taken: 0.288 seconds 2 For the same DDL given above, if we provide clause* collection items terminated by '|' *and still use default delimiters (since there is no way to use given delimiter '|') then the select query shows incorrect data. Please let me know if this is something expected. e.g. *DDL*: hive create table example(col1 int, col2 arraystructst1:int,st2:string) row format delimited fields terminated by ',' collection items terminated by '|'; OK Time taken: 0.175 seconds *Sample data loaded:* 1,1^Cstring1^B2^Cstring2 *O/P: *hive select * from example; OK 1[{st1:1,st2:string1\u00022}] Time taken: 0.141 seconds ** Thanks Regards. -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com