Sandra, 

Your approach #2 is one the right track, but it looks like there is a bug in 
how external function framework handles optional record types.
The return type for "(SELECT VALUE r FROM RankingResult r)[0]" is computed as 
"RankingResultType?" which means that it could either be a record or NULL or 
MISSING.
There's a rule in the optimizer that deals with external functions and that 
rule incorrectly fails on optional record types.
It'd be great if you could file a bug for this.

As a workaround try passing record fields as primitive types to your function 
instead of the whole record.
LET rankingResult = (SELECT VALUE r FROM RankingResult r)[0] 
SELECT testlib#detectRelevance(newItem, rankingResult.first.score, 
rankingResult.second.score, rankingResult.third.score, 
rankingResult.fourth.score, rankingResult.fifth.score)

You'll also need to change a function declaration to accept primitive types 
instead of the record type:
<argument_type> ..., ADouble, ADouble, ADouble, ADouble, ADouble 
</argument_type>

Thanks,
-- Dmitry
 

On 4/25/19, 3:57 PM, "sandraskarshaug@" <gmail.com [email protected]> 
wrote:

    I forgot to add a link to the paper about the decoupled ingestion framework 
[1].
    
    [1] https://arxiv.org/pdf/1902.08271.pdf
    
    On 2019/04/25 22:54:15, [email protected] 
<[email protected]> wrote: 
    > Hi, thanks for your reply! I will try to be a bit more precise :-) 
    > 
    > I am currently testing the decoupled framework, and I would like to use 
data from another dataset when enriching tweets, here being data from the 
RankingResult dataset. Additionally, I would like to send the incoming tweet, 
as well as a record from RankingResult (say with id = 1) to a Java UDF (from 
within the SQL++ UDF) for more complex processing, like clustering the tweets, 
and scoring them based on how relevant they are for a given topic. The scoring 
within the Java UDF requires information about the record stored in 
RankingResult.
    > 
    > Applying the SQL++ UDF to a TwitterFeed, I aim to check whether a tweet 
is scored higher than the tweets found in the RankingList record (containing 
the top ranked tweets for the given topic). I see now that I could select the 
record I wish to use by "SELECT VALUE r FROM RankingResult r where id=1". One 
can think of the RankingResult dataset to hold one record per topic/user query 
which I want to find the top k most relevant tweets for. 
    > 
    > The overall goal of the project is to see if AsterixDB can be used to 
continuously rank tweets in real-time with respect to a user-defined query, 
meaning that the RankingResult record for the given user query should be 
updated continuously. I am however also looking into creating a TreeMap data 
structure in the Java UDF to hold the top current tweets and their scores, and 
use this for deciding whether the incoming tweet should switch place with any 
of the top ranked tweets. However, I would like to update the RankingResult 
record in order to make the data queryable.
    > 
    > 
    > Thanks in advance,
    > Sandra
    > 
    > On 2019/04/25 22:10:56, Mike Carey <[email protected]> wrote: 
    > > I will let someone else chime in on what the compilation error might be 
    > > about, but approach 1 has the problem that you rightly tried to correct 
    > > in approach 2 (because SELECT always returns an array of results).  But 
    > > - could you say a bit more - up 5000 feet - about the use case you are 
    > > trying to address...?  It's not clear (to me) why one might want to 
have 
    > > a single-item dataset - perhaps that's just a part of your 
    > > trying-to-make-this-work debugging - but it might help if the group 
    > > could see what you are trying to do overall.  (E.g., if you just want 
to 
    > > process incoming records on a feed, you wouldn't need another dataset 
    > > for that.  What's the more general picture/desire?)
    > > 
    > > Cheers,
    > > 
    > > Mike
    > > 
    > > On 4/25/19 12:08 AM, [email protected] wrote:
    > > > Hi devs!
    > > >
    > > > Given a datatype RankingResultType and a dataset 
RankingResult(RankingResultType) which contains only one record, what is the 
correct approach when I want to pass a single RankingResult record as an 
argument to a Java UDF in a SQL++ UDF? The resulting record of the Java UDF 
should be selected at the end of the UDF as it is going to be stored in the 
dataset the feed which uses the SQL++ UDF is attached to.
    > > >
    > > > CREATE FUNCTION rank(newItem) {
    > > >   LET rankingResult = *must select the record here*,
    > > >   SELECT testlib#detectRelevance(newItem, *must pass RankingResult 
record here*)
    > > > };
    > > >
    > > > I have tried some different approaches, for instance
    > > > 1. running LET rankingResult = (SELECT VALUE r FROM RankingResult r)
    > > >   SELECT testlib#detectRelevance(newItem, rankingResult)
    > > > 2. running LET rankingResult = (SELECT VALUE r FROM RankingResult 
r)[0] SELECT testlib#detectRelevance(newItem, rankingResult)
    > > >
    > > > The first approach throws a TypeMismatchException, ASX1002: Type 
mismatch: function testlib#detectRelevance expects its 2nd input parameter to 
be of type object, but the actual input type is array
    > > >
    > > > So I therefore tried to access the first element of the array in the 
second approach, but the second approach does not compile:
    > > > SX1079: Compilation error: The input type union(RankingResultType: 
closed {
    > > >    id: bigint,
    > > >    first: RankingType: open { score: double },
    > > >    second: RankingType: open {score: double},
    > > >    third: RankingType: open { score: double},
    > > >    fourth: RankingType: open {score: double},
    > > >    fifth: RankingType: open {score: double}
    > > > } , null, missing) is not a valid record type!
    > > >
    > > > Could you maybe point me in the right direction?
    > > > Thanks in advance!
    > > >
    > > > Best,
    > > > Sandra
    > > >
    > > 
    > 
    

Reply via email to