[jira] [Commented] (PHOENIX-1516) Add RANDOM built-in function

James Taylor (JIRA) Mon, 22 Dec 2014 22:59:48 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256672#comment-14256672
 ]


James Taylor commented on PHOENIX-1516:
---------------------------------------

When you use the wrapped result iterator, if you want to repro what you have 
now, you'd need to not call reset the first time, but call it every time 
afterwards before calling next. I'm not sure what the call sequence is for 
Filter.reset() being called, but I thought it was after each row was processed. 
If that's the case, then you'd need to cache the calculation of 
RandomFunction.evaluate() and keep returning the same result. The reset() would 
clear the cached result.

Good test  {{upsert into t select pk, rand() from t}}. Would be good to have a 
test like this both with and without autocommit on. For autocommit on, it'll be 
executed server-side and you'll need a reset() call in 
UngroupedAggregateRegionObserver.doPostScannerOpen():303, where we loop through 
each SELECT expression to create the mutations that get committed here in 
batches.

For autocommit off, there are a few issues to solve due to the keeping of state 
on the expression now. We have an optimization for "pipelining" the results of 
a SELECT back out direct as Puts to the table being UPSERT INTO (as opposed to 
spooling the results and then sending Puts based on the the spooled results). 
We're currently sharing the same RowProjector across all of these threads, so 
we can either:
- turn off this optimization if RAND is used in UpsertCompiler.compile():399. 
We do the same when a sequence is used in an UPSERT SELECT (by not setting the 
parallelIteratorFactoryToBe var). In that case, each select expression is 
evaluated post parallelization by a single thread.
{code}
                    if (! (select.isAggregate() || select.isDistinct() || 
select.getLimit() != null || select.hasSequence()) ) {
                        // We can pipeline the upsert select instead of 
spooling everything to disk first,
                        // if we don't have any post processing that's required.
                        parallelIteratorFactoryToBe = new 
UpsertingParallelIteratorFactory(connection, tableRefToBe);
                        // If we're in the else, then it's not an aggregate, 
distinct, limted, or sequence using query,
                        // so we might be able to run it entirely on the server 
side.
                        runOnServer = sameTable && isAutoCommit && 
!(table.isImmutableRows() && !table.getIndexes().isEmpty());
                    }
{code}
- or clone the RowProjector in UpsertCompiler.mutate():193 which is where we 
manufacture the PhoenixStatement that'll run in each separate thread. We don't 
have a RowProjector.clone() method, though. The easiest way to accomplish this 
would be to serialize the SELECT expression and then deserialize it for each 
clone you want to make. See BooleanExpressionFilter.writeFields() and 
readFields() for serializing and deserializing. 

> Add RANDOM built-in function
> ----------------------------
>
>                 Key: PHOENIX-1516
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1516
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 1516-v2.txt, 1516-v3.txt, 1516.txt
>
>
> I often find it useful to generate some rows with random data.
> Here's a simple RANDOM() function that we could use for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1516) Add RANDOM built-in function

Reply via email to