I forgot to add a link to the paper about the decoupled ingestion framework [1].

[1] https://arxiv.org/pdf/1902.08271.pdf

On 2019/04/25 22:54:15, [email protected] <[email protected]> 
wrote: 
> Hi, thanks for your reply! I will try to be a bit more precise :-) 
> 
> I am currently testing the decoupled framework, and I would like to use data 
> from another dataset when enriching tweets, here being data from the 
> RankingResult dataset. Additionally, I would like to send the incoming tweet, 
> as well as a record from RankingResult (say with id = 1) to a Java UDF (from 
> within the SQL++ UDF) for more complex processing, like clustering the 
> tweets, and scoring them based on how relevant they are for a given topic. 
> The scoring within the Java UDF requires information about the record stored 
> in RankingResult.
> 
> Applying the SQL++ UDF to a TwitterFeed, I aim to check whether a tweet is 
> scored higher than the tweets found in the RankingList record (containing the 
> top ranked tweets for the given topic). I see now that I could select the 
> record I wish to use by "SELECT VALUE r FROM RankingResult r where id=1". One 
> can think of the RankingResult dataset to hold one record per topic/user 
> query which I want to find the top k most relevant tweets for. 
> 
> The overall goal of the project is to see if AsterixDB can be used to 
> continuously rank tweets in real-time with respect to a user-defined query, 
> meaning that the RankingResult record for the given user query should be 
> updated continuously. I am however also looking into creating a TreeMap data 
> structure in the Java UDF to hold the top current tweets and their scores, 
> and use this for deciding whether the incoming tweet should switch place with 
> any of the top ranked tweets. However, I would like to update the 
> RankingResult record in order to make the data queryable.
> 
> 
> Thanks in advance,
> Sandra
> 
> On 2019/04/25 22:10:56, Mike Carey <[email protected]> wrote: 
> > I will let someone else chime in on what the compilation error might be 
> > about, but approach 1 has the problem that you rightly tried to correct 
> > in approach 2 (because SELECT always returns an array of results).  But 
> > - could you say a bit more - up 5000 feet - about the use case you are 
> > trying to address...?  It's not clear (to me) why one might want to have 
> > a single-item dataset - perhaps that's just a part of your 
> > trying-to-make-this-work debugging - but it might help if the group 
> > could see what you are trying to do overall.  (E.g., if you just want to 
> > process incoming records on a feed, you wouldn't need another dataset 
> > for that.  What's the more general picture/desire?)
> > 
> > Cheers,
> > 
> > Mike
> > 
> > On 4/25/19 12:08 AM, [email protected] wrote:
> > > Hi devs!
> > >
> > > Given a datatype RankingResultType and a dataset 
> > > RankingResult(RankingResultType) which contains only one record, what is 
> > > the correct approach when I want to pass a single RankingResult record as 
> > > an argument to a Java UDF in a SQL++ UDF? The resulting record of the 
> > > Java UDF should be selected at the end of the UDF as it is going to be 
> > > stored in the dataset the feed which uses the SQL++ UDF is attached to.
> > >
> > > CREATE FUNCTION rank(newItem) {
> > >   LET rankingResult = *must select the record here*,
> > >   SELECT testlib#detectRelevance(newItem, *must pass RankingResult record 
> > > here*)
> > > };
> > >
> > > I have tried some different approaches, for instance
> > > 1. running LET rankingResult = (SELECT VALUE r FROM RankingResult r)
> > >   SELECT testlib#detectRelevance(newItem, rankingResult)
> > > 2. running LET rankingResult = (SELECT VALUE r FROM RankingResult r)[0] 
> > > SELECT testlib#detectRelevance(newItem, rankingResult)
> > >
> > > The first approach throws a TypeMismatchException, ASX1002: Type 
> > > mismatch: function testlib#detectRelevance expects its 2nd input 
> > > parameter to be of type object, but the actual input type is array
> > >
> > > So I therefore tried to access the first element of the array in the 
> > > second approach, but the second approach does not compile:
> > > SX1079: Compilation error: The input type union(RankingResultType: closed 
> > > {
> > >    id: bigint,
> > >    first: RankingType: open { score: double },
> > >    second: RankingType: open {score: double},
> > >    third: RankingType: open { score: double},
> > >    fourth: RankingType: open {score: double},
> > >    fifth: RankingType: open {score: double}
> > > } , null, missing) is not a valid record type!
> > >
> > > Could you maybe point me in the right direction?
> > > Thanks in advance!
> > >
> > > Best,
> > > Sandra
> > >
> > 
> 

Reply via email to