Re: Add record as argument to Java UDF

sandraskarshaug Fri, 26 Apr 2019 08:55:44 -0700

Hi Dmitry, thanks for your reply!

I changed the Java function declaration, and tried to define the SQL++ function 
as you described, and apply it to the TwitterFeed. However, that resulted in 
the "start feed TwitterFeed" query to never finish executing, it kind of just 
halts (no job execution time provided in the web interface). I am currently 
using a AsterixDB version provided by Xikui, which uses a decoupled ingestion 
framework.


Maybe Xikui knows if there should be any problem passing primitive types to the 
Java UDF in that version? The version with the decoupled ingestion framework 
uses a different function signature than the master version.

Best,
Sandra
On 2019/04/26 00:02:54, Dmitry Lychagin <[email protected]> 
wrote: 
> Sandra, 
> 
> Your approach #2 is one the right track, but it looks like there is a bug in 
> how external function framework handles optional record types.
> The return type for "(SELECT VALUE r FROM RankingResult r)[0]" is computed as 
> "RankingResultType?" which means that it could either be a record or NULL or 
> MISSING.
> There's a rule in the optimizer that deals with external functions and that 
> rule incorrectly fails on optional record types.
> It'd be great if you could file a bug for this.
> 
> As a workaround try passing record fields as primitive types to your function 
> instead of the whole record.
> LET rankingResult = (SELECT VALUE r FROM RankingResult r)[0] 
> SELECT testlib#detectRelevance(newItem, rankingResult.first.score, 
> rankingResult.second.score, rankingResult.third.score, 
> rankingResult.fourth.score, rankingResult.fifth.score)
> 
> You'll also need to change a function declaration to accept primitive types 
> instead of the record type:
> <argument_type> ..., ADouble, ADouble, ADouble, ADouble, ADouble 
> </argument_type>
> 
> Thanks,
> -- Dmitry
>  
> 
> On 4/25/19, 3:57 PM, "sandraskarshaug@" <gmail.com 
> [email protected]> wrote:
> 
>     I forgot to add a link to the paper about the decoupled ingestion 
> framework [1].
>     
>     [1] https://arxiv.org/pdf/1902.08271.pdf
>     
>     On 2019/04/25 22:54:15, [email protected] 
> <[email protected]> wrote: 
>     > Hi, thanks for your reply! I will try to be a bit more precise :-) 
>     > 
>     > I am currently testing the decoupled framework, and I would like to use 
> data from another dataset when enriching tweets, here being data from the 
> RankingResult dataset. Additionally, I would like to send the incoming tweet, 
> as well as a record from RankingResult (say with id = 1) to a Java UDF (from 
> within the SQL++ UDF) for more complex processing, like clustering the 
> tweets, and scoring them based on how relevant they are for a given topic. 
> The scoring within the Java UDF requires information about the record stored 
> in RankingResult.
>     > 
>     > Applying the SQL++ UDF to a TwitterFeed, I aim to check whether a tweet 
> is scored higher than the tweets found in the RankingList record (containing 
> the top ranked tweets for the given topic). I see now that I could select the 
> record I wish to use by "SELECT VALUE r FROM RankingResult r where id=1". One 
> can think of the RankingResult dataset to hold one record per topic/user 
> query which I want to find the top k most relevant tweets for. 
>     > 
>     > The overall goal of the project is to see if AsterixDB can be used to 
> continuously rank tweets in real-time with respect to a user-defined query, 
> meaning that the RankingResult record for the given user query should be 
> updated continuously. I am however also looking into creating a TreeMap data 
> structure in the Java UDF to hold the top current tweets and their scores, 
> and use this for deciding whether the incoming tweet should switch place with 
> any of the top ranked tweets. However, I would like to update the 
> RankingResult record in order to make the data queryable.
>     > 
>     > 
>     > Thanks in advance,
>     > Sandra
>     > 
>     > On 2019/04/25 22:10:56, Mike Carey <[email protected]> wrote: 
>     > > I will let someone else chime in on what the compilation error might 
> be 
>     > > about, but approach 1 has the problem that you rightly tried to 
> correct 
>     > > in approach 2 (because SELECT always returns an array of results).  
> But 
>     > > - could you say a bit more - up 5000 feet - about the use case you 
> are 
>     > > trying to address...?  It's not clear (to me) why one might want to 
> have 
>     > > a single-item dataset - perhaps that's just a part of your 
>     > > trying-to-make-this-work debugging - but it might help if the group 
>     > > could see what you are trying to do overall.  (E.g., if you just want 
> to 
>     > > process incoming records on a feed, you wouldn't need another dataset 
>     > > for that.  What's the more general picture/desire?)
>     > > 
>     > > Cheers,
>     > > 
>     > > Mike
>     > > 
>     > > On 4/25/19 12:08 AM, [email protected] wrote:
>     > > > Hi devs!
>     > > >
>     > > > Given a datatype RankingResultType and a dataset 
> RankingResult(RankingResultType) which contains only one record, what is the 
> correct approach when I want to pass a single RankingResult record as an 
> argument to a Java UDF in a SQL++ UDF? The resulting record of the Java UDF 
> should be selected at the end of the UDF as it is going to be stored in the 
> dataset the feed which uses the SQL++ UDF is attached to.
>     > > >
>     > > > CREATE FUNCTION rank(newItem) {
>     > > >   LET rankingResult = *must select the record here*,
>     > > >   SELECT testlib#detectRelevance(newItem, *must pass RankingResult 
> record here*)
>     > > > };
>     > > >
>     > > > I have tried some different approaches, for instance
>     > > > 1. running LET rankingResult = (SELECT VALUE r FROM RankingResult r)
>     > > >   SELECT testlib#detectRelevance(newItem, rankingResult)
>     > > > 2. running LET rankingResult = (SELECT VALUE r FROM RankingResult 
> r)[0] SELECT testlib#detectRelevance(newItem, rankingResult)
>     > > >
>     > > > The first approach throws a TypeMismatchException, ASX1002: Type 
> mismatch: function testlib#detectRelevance expects its 2nd input parameter to 
> be of type object, but the actual input type is array
>     > > >
>     > > > So I therefore tried to access the first element of the array in 
> the second approach, but the second approach does not compile:
>     > > > SX1079: Compilation error: The input type union(RankingResultType: 
> closed {
>     > > >    id: bigint,
>     > > >    first: RankingType: open { score: double },
>     > > >    second: RankingType: open {score: double},
>     > > >    third: RankingType: open { score: double},
>     > > >    fourth: RankingType: open {score: double},
>     > > >    fifth: RankingType: open {score: double}
>     > > > } , null, missing) is not a valid record type!
>     > > >
>     > > > Could you maybe point me in the right direction?
>     > > > Thanks in advance!
>     > > >
>     > > > Best,
>     > > > Sandra
>     > > >
>     > > 
>     > 
>     
> 
>

Re: Add record as argument to Java UDF

Reply via email to