Hi I have try to do as you described. Let me explain in steps. 1- create table test (xmlFile String); ----------------------------------------------------------------------------------
2-LOAD DATA LOCAL INPATH '1.xml' OVERWRITE INTO TABLE test; ---------------------------------------------------------------------------------- 3-CREATE TABLE test_new ( b STRING, c STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; ---------------------------------------------------------------------------------- 4-add FILE sampleMapper.groovy; ---------------------------------------------------------------------------------- 5- INSERT OVERWRITE TABLE test_new SELECT TRANSFORM (xmlfile) USING 'sampleMapper.groovy' AS (b,c) FROM test; ---------------------------------------------------------------------------------- *XML FILE*: xml file has only one row for testing purpose which is <xy><a><b>Hello</b><c>world</c></a></xy> ---------------------------------------------------------------------------------- *MAPPER* and i have write the mapper in groovy to parse it. the mapper is def xmlData ="" System.in.withReader { xmlData=xmlData+ it.readLine() } def xy = new XmlParser().parseText(xmlData) def b=xy.a.b.text() def c=xy.a.c.text() println ([b,c].join('\t') ) ---------------------------------------------------------------------------------- Now step 1-4 are fine but when i perform step 5 which will load the data from test table to new table using mapper, it throws the error. The error on console is *FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver* I am facing hard time. Any suggestions Thanks On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <athu...@facebook.com> wrote: > You could load this whole xml file into a table with a single row and a > single column. The default record delimiter is \n but you can create a table > where the record delimiter is \001. Once you do that you can follow the > approach that you described below. Will this solve your problem? > > Ashish > > ------------------------------ > *From:* Shuja Rehman [mailto:shujamug...@gmail.com] > *Sent:* Wednesday, June 09, 2010 3:07 PM > *To:* hive-user@hadoop.apache.org > *Subject:* Load data from xml using Mapper.py in hive > > Hi > I have created a table in hive (Suppose table1 with two columns, col1 and > col2 ) > > now i have an xml file for which i have write a python script which read > the xml file and transform it in single row with tab seperated > e.g the output of python script can be > > row 1 = val1 val2 > row2 = val3 val4 > > so the output of file has straight rows with the help of python script. now > i want to load this into created table. I have seen the example of in which > the data is first loaded in u_data table then transform it using python > script in u_data_new but in m scenario. it does not fit as i have xml file > as source. > > > Kindly let me know can I achieve this?? > Thanks > > -- > -- Regards Baig