Hi
I have try to do as you described. Let me explain in steps.

1- create table test (xmlFile String);
----------------------------------------------------------------------------------

2-LOAD DATA LOCAL INPATH '1.xml'
OVERWRITE INTO TABLE test;
----------------------------------------------------------------------------------

3-CREATE TABLE test_new (
    b STRING,
    c STRING
  )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

----------------------------------------------------------------------------------
4-add FILE sampleMapper.groovy;
----------------------------------------------------------------------------------
5- INSERT OVERWRITE TABLE test_new
SELECT
  TRANSFORM (xmlfile)
  USING 'sampleMapper.groovy'
  AS (b,c)
FROM test;
----------------------------------------------------------------------------------
*XML FILE*:
xml file has only one row for testing purpose which is

<xy><a><b>Hello</b><c>world</c></a></xy>
----------------------------------------------------------------------------------
*MAPPER*
and i have write the mapper in groovy to parse it. the mapper is

   def xmlData =""
 System.in.withReader {
        xmlData=xmlData+ it.readLine()
}

def xy = new XmlParser().parseText(xmlData)
def b=xy.a.b.text()
    def c=xy.a.c.text()
    println  ([b,c].join('\t') )
----------------------------------------------------------------------------------
Now step 1-4 are fine but when i perform step 5 which will load the data
from test table to new table using mapper, it throws the error. The error on
console is

*FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver*

I am facing hard time. Any suggestions
Thanks

On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <athu...@facebook.com> wrote:

>  You could load this whole xml file into a table with a single row and a
> single column. The default record delimiter is \n but you can create a table
> where the record delimiter is \001. Once you do that you can follow the
> approach that you described below. Will this solve your problem?
>
> Ashish
>
>  ------------------------------
> *From:* Shuja Rehman [mailto:shujamug...@gmail.com]
> *Sent:* Wednesday, June 09, 2010 3:07 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Load data from xml using Mapper.py in hive
>
> Hi
> I have created a table in hive (Suppose table1 with two columns, col1 and
> col2 )
>
> now i have an xml file for which i have write a python script which read
> the xml file and transform it in single row with tab seperated
> e.g the output of python script can be
>
> row 1 = val1     val2
> row2 =  val3     val4
>
> so the output of file has straight rows with the help of python script. now
> i want to load this into created table. I have seen the example of in which
> the data is first loaded in u_data table then transform it using python
> script in u_data_new but in m scenario. it does not fit as i have xml file
> as source.
>
>
> Kindly let me know can I achieve this??
> Thanks
>
> --
>

-- 
Regards
Baig

Reply via email to