failed with null pointer exception.
hive>select /*+ MAPJOIN(a) */ a.url_pattern, w.url from (select
x.url_pattern from application x where x.dt = '20090609') a join web_log w
where w.logdate='20090611' and w.url rlike a.url_pattern;
FAILED: Unknown exception : null
$cat /tmp/hive/hive.log | tail...
2009-06-15 13:57:02,933 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Unknown exception : null
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.parse.QBMetaData.getTableForAlias(QBMetaData.java:76)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.getTableColumnDesc(PartitionPruner.java:284)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:217)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.genExprNodeDesc(PartitionPruner.java:231)
at
org.apache.hadoop.hive.ql.parse.PartitionPruner.addExpression(PartitionPruner.java:377)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPartitionPruners(SemanticAnalyzer.java:608)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3785)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:309)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
On Mon, Jun 15, 2009 at 1:52 PM, Namit Jain <[email protected]> wrote:
> The problem seems to be in partition pruning – The small table
> ‘application’ is partitioned – and probably, there are 20k rows in the
> partition
> 20090609.
>
> Due to a bug, the pruning is not happening, and all partitions of
> ‘application’ are being loaded – which may be too much for map-join to
> handle.
> This is a serious bug, but for now can you put in a subquery and try -
>
> select /*+ MAPJOIN(a) */ a.url_pattern, w.url from (select x.url_pattern
> from application x where x.dt = ‘20090609’) a join web_log w where
> w.logdate='20090611' and w.url rlike a.url_pattern;
>
>
> Please file a JIRA for the above.
>
>
>
>
> On 6/14/09 10:20 PM, "Min Zhou" <[email protected]> wrote:
>
> hive> explain select /*+ MAPJOIN(a) */ a.url_pattern, w.url from
> application a join web_log w where w.logdate='20090611' and w.url rlike
> a.url_pattern and a.dt='20090609';
> OK
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF application a) (TOK_TABREF
> web_log w))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
> (TOK_SELECT (TOK_HINTLIST (TOK_HINT TOK_MAPJOIN (TOK_HINTARGLIST a)))
> (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) url_pattern)) (TOK_SELEXPR (.
> (TOK_TABLE_OR_COL w) url))) (TOK_WHERE (and (and (= (. (TOK_TABLE_OR_COL w)
> logdate) '20090611') (rlike (. (TOK_TABLE_OR_COL w) url) (.
> (TOK_TABLE_OR_COL a) url_pattern))) (= (. (TOK_TABLE_OR_COL a) dt)
> '20090609')))))
>
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 is a root stage
>
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> w
> Select Operator
> expressions:
> expr: url
> type: string
> expr: logdate
> type: string
> Common Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {0} {1}
> 1 {0} {1}
> keys:
> 0
> 1
> Position of Big Table: 1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> Local Work:
> Map Reduce Local Work
> Alias -> Map Local Tables:
> a
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> a
> Select Operator
> expressions:
> expr: url_pattern
> type: string
> expr: dt
> type: string
> Common Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {0} {1}
> 1 {0} {1}
> keys:
> 0
> 1
> Position of Big Table: 1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
> hdfs://hdpnn.cm3:9000/group/taobao/hive/hive-tmp/220575636/10004
> Select Operator
> Filter Operator
> predicate:
> expr: (((3 = '20090611') and (2 regexp 0)) and (1 =
> '20090609'))
> type: boolean
> Select Operator
> expressions:
> expr: 0
> type: string
> expr: 2
> type: string
> File Output Operator
> compressed: true
> GlobalTableId: 0
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Stage: Stage-0
> Fetch Operator
> limit: -1
>
> On Mon, Jun 15, 2009 at 1:14 PM, Namit Jain <[email protected]> wrote:
>
> I was looking at the code – and there may be a bug in cartesian product
> codepath for map-join.
>
> Can you do a explain plan and send it ?
>
>
>
>
>
>
> On 6/14/09 10:06 PM, "Min Zhou" <[email protected]> wrote:
>
>
> 1. tried setting hive.mapjoin.cache.numrows to be 100, failed with the
> same exception.
> 2. Actually, we used to do the same thing by loading small tables into
> memory of each map node in normal map-reduce with the same cluster, where
> same heap size is guranteed between running hive map-side join and our
> map-reduce job. OOM exceptions never happened in that only 1MB would be
> spent to load those 20k pieces of records while mapred.child.java.opts was
> set to be -Xmx200m.
>
> here is the schema of our small table:
> > describe application;
> transaction_id string
> subclass_id string
> class_id string
> memo string
> url_alias string
> url_pattern string
> dt string (daily partitioned)
>
> Thanks,
> Min
> On Mon, Jun 15, 2009 at 12:51 PM, Namit Jain <[email protected]> wrote:
>
> 1. Can you reduce the number of cached rows and try ?
>
> 2. Were you using default memory settings of the mapper ? If yes, can can
> increase it and try ?
>
> It would be useful to try both of them independently – it would give a good
> idea of memory consumption of JDBM also.
>
>
> Can you send the exact schema/data of the small table if possible ? You can
> file a jira and load it there if it not a security issue.
>
> Thanks,
> -namit
>
>
>
> On 6/14/09 9:23 PM, "Min Zhou" <[email protected]> wrote:
>
> 20k
>
>
>
>
>
>
--
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.
My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com