[
https://issues.apache.org/jira/browse/PHOENIX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501533#comment-14501533
]
James Taylor commented on PHOENIX-1870:
---------------------------------------
Good questions, [~shuxi0ng].
bq. 1. What tuple is used for?
Tuple is used to store the state of the current row being evaluated. It's
mainly used at evaluation time when an expression is a reference to a column.
If an expression.isStateless() is true, that means it doesn't need to contact
the server for any information. If an expression.isDetermism() ==
Determinism.ALWAYS, this means it'll give the same output each time with the
same input. Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR
my_sequence. So if expression.isStateless() is true and
expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and
can just use null instead. It'd actually be nice if we could remove Tuple
completely from the evaluate method, but that's a bigger change for another day.
bq. 2. In some functions, for example RegexpReplaceFunction, there are two
arguments, source strings and replace strings. How should the replace interface
be designed?
Maybe the most consistent way would be to pass a byte[] buf, int offset, int
length triple for each argument. You can then use the ImmutableBytesWritable
passed in to evaluate to evaluate each one, assigning local variables for the
above arguments before evaluating the next one.
bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member
replacePtr, and
if is used in evaluation, and will be set to empty bytes after evaluation.
Oh, I missed that. That's potentially problematic, as these expressions are
potentially used by multiple threads at the same time. I'd recommend changing
that as I've described above.
bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to
Regex Engine, and srcStr and replaceStr use the same ptr. Which one is better?
See response for (2), as that's probably the most consistent (even though it
adds extra args).
bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some
of the three arguments is Determinism.ALWAYS, so they can be pre-compute in
init(). Why not try to cache the value?
Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it
can be cached. FYI, we have a check above this in ExpressionCompiler that'll
evaluate a function at compile time if *all* of it's arguments are constant.
bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS,
and in every evaluation of fun1, we don't have to compute fun2 again, because
we can pre-compute in fun1.init().
Yes, correct. In this case, fun2(arg2) will come in already pre-computed.
> Fix NPE occurring during regex processing when joni library not used
> --------------------------------------------------------------------
>
> Key: PHOENIX-1870
> URL: https://issues.apache.org/jira/browse/PHOENIX-1870
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Shuxiong Ye
> Fix For: 5.0.0, 4.4.0
>
> Attachments:
> 0001-PHOENIX-1870-Fix-NPE-occurring-during-regex-processi.patch,
> PHOENIX-1870-Regex-Expression-refinement.patch, PHOENIX-1870_v2.patch
>
>
> Tests in error:
> {code}
>
> IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex:1277->helpTestCaseSensitiveFunctionIndex:1321
> » Commit
>
> IndexExpressionIT.testImmutableLocalCaseSensitiveFunctionIndex:1282->helpTestCaseSensitiveFunctionIndex:1321
> » Commit
> {code}
> Stack trace:
> {code}
> testImmutableCaseSensitiveFunctionIndex(org.apache.phoenix.end2end.index.IndexExpressionIT)
> Time elapsed: 0.723 sec <<< ERROR!
> org.apache.phoenix.execute.CommitException:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2
> actions:
> org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed
> to build index for unexpected reason!
> at
> org.apache.phoenix.hbase.index.util.IndexManagementUtil.rethrowIndexingException(IndexManagementUtil.java:180)
> at
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:206)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$35.call(RegionCoprocessorHost.java:989)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1671)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1703)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:985)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2697)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2478)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2432)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2436)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:641)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:605)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1822)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
> at
> org.apache.phoenix.expression.util.regex.JavaPattern.substr(JavaPattern.java:83)
> at
> org.apache.phoenix.expression.function.RegexpSubstrFunction.evaluate(RegexpSubstrFunction.java:118)
> at
> org.apache.phoenix.index.IndexMaintainer.buildRowKey(IndexMaintainer.java:453)
> at
> org.apache.phoenix.index.IndexMaintainer.buildDeleteMutation(IndexMaintainer.java:842)
> at
> org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(PhoenixIndexCodec.java:173)
> at
> org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(PhoenixIndexCodec.java:119)
> at
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(CoveredColumnsIndexBuilder.java:403)
> at
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(CoveredColumnsIndexBuilder.java:287)
> at
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(CoveredColumnsIndexBuilder.java:239)
> at
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(CoveredColumnsIndexBuilder.java:136)
> at
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(CoveredColumnsIndexBuilder.java:99)
> at
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> at
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:129)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ... 1 more
> : 2 times,
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:928)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:942)
> at
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:422)
> at
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:439)
> at
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:436)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:436)
> at
> org.apache.phoenix.end2end.index.IndexExpressionIT.helpTestCaseSensitiveFunctionIndex(IndexExpressionIT.java:1321)
> at
> org.apache.phoenix.end2end.index.IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex(IndexExpressionIT.java:1277)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)