[ https://issues.apache.org/jira/browse/DRILL-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082946#comment-14082946 ]
Andy Pernsteiner commented on DRILL-1239: ----------------------------------------- so, I modified the JSON file so that the 'keywords' field is no longer an array (that may or may not contain values for each record). Instead it is now just a single (non-null) value per record. I no longer am hitting the above mentioned error. select * from click.`json_large`.`/mobile.json` ; <lots of rows> 1,000,000 rows selected (88.016 seconds) What's interesting is that there is another field (trans_info.prod_id) which is array, and it sometimes has null values as well (although they are int's, not strings)..those don't seem to be causing a problem. Next I'll try creating the JSON file with an array for keywords, but will ensure that there are no NULL values for any records. Note that it seems that the NULL values have to line up with a boundary of sorts in order to cause an error (plenty of the records had them in my first attempts, but only some would trigger failure..) > java.lang.AssertionError When performing select against nested JSON > 60,000 > records > ------------------------------------------------------------------------------------ > > Key: DRILL-1239 > URL: https://issues.apache.org/jira/browse/DRILL-1239 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Affects Versions: 0.4.0 > Environment: Seen both on standalone (OSX host, 16 GB RAM) as well as > on cluster in AWS: > 5 nodes, centos-6.5, 64GB RAM, 2 SSD's/node for mfs/dfs. Running MapR 3.1.1. > Reporter: Andy Pernsteiner > > Using a JSON file with contents like so for each record: > {code}: > {"trans_id":999999,"date":"11/03/2012","time":"09:07:05","user_info":{"cust_id":2,"device":"AOS4.3","state":"tx"},"marketing_info":{"camp_id":14,"keywords":["it","i","wants","yes","things","few","like"]},"trans_info":{"prod_id":[167,145,5,487,290],"purch_flag":"false"}} > {code} > First I set the following to get more verbose output: > {quote} > 0: jdbc:drill:> alter session set `exec.errors.verbose`=true; > {quote} > Then performed a simple select via sqlline: > {quote} > select * from dfs.`/mapr/drillram/JSON/large/mobile.json`; > <50,000+ rows of output> > | 56184 | 03/11/2013 | 14:19:10 | > {"cust_id":4,"device":"IOS5","state":"va"} | > {"camp_id":15,"keywords":["young"]} | {"prod_id | > | 56185 | 07/03/2013 | 14:30:38 | > {"cust_id":1518,"device":"AOS4.4","state":"wi"} | > {"camp_id":11,"keywords":["so","way","okay | > | 56186 | 07/07/2013 | 10:41:04 | > {"cust_id":97279,"device":"IOS5","state":"ga"} | {"camp_id":7,"keywords":[]} > | {"prod_id":[9 | > Query failed: Failure while running fragment. null > [4407eef7-06aa-4cf9-9962-a2f187ce8f17] > Node details: ip-172-16-1-111:31011/31012 > java.lang.AssertionError > at > org.apache.drill.exec.vector.complex.WriteState.fail(WriteState.java:37) > at > org.apache.drill.exec.vector.complex.impl.AbstractBaseWriter.inform(AbstractBaseWriter.java:62) > at > org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.inform(RepeatedBigIntWriterImpl.java:108) > at > org.apache.drill.exec.vector.complex.impl.RepeatedBigIntWriterImpl.setPosition(RepeatedBigIntWriterImpl.java:130) > at > org.apache.drill.exec.vector.complex.impl.SingleListWriter.setPosition(SingleListWriter.java:700) > at > org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153) > at > org.apache.drill.exec.vector.complex.impl.SingleMapWriter.setPosition(SingleMapWriter.java:153) > at > org.apache.drill.exec.vector.complex.impl.VectorContainerWriter.setPosition(VectorContainerWriter.java:66) > at > org.apache.drill.exec.store.easy.json.JSONRecordReader2.next(JSONRecordReader2.java:80) > at > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:148) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:116) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:59) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:98) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:49) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116) > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > java.lang.RuntimeException: java.sql.SQLException: Failure while trying to > get next result batch. > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > at sqlline.SqlLine.print(SqlLine.java:1809) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > at sqlline.SqlLine.dispatch(SqlLine.java:889) > at sqlline.SqlLine.begin(SqlLine.java:763) > at sqlline.SqlLine.start(SqlLine.java:498) > at sqlline.SqlLine.main(SqlLine.java:460) > {quote} > If I re-run the same query against a smaller version of the same dataset > (<50,000 records), I don't have the issue. So far I've tried modifying the > DRILL_MAX_DIRECT_MEMORY and > DRILL_MAX_HEAP variables to see if I could find something that works, but > neither seem to make a difference. Note: the error appears the same if I run > on standalone mode. -- This message was sent by Atlassian JIRA (v6.2#6252)