I forgot to add a link to the paper about the decoupled ingestion framework [1].
[1] https://arxiv.org/pdf/1902.08271.pdf On 2019/04/25 22:54:15, [email protected] <[email protected]> wrote: > Hi, thanks for your reply! I will try to be a bit more precise :-) > > I am currently testing the decoupled framework, and I would like to use data > from another dataset when enriching tweets, here being data from the > RankingResult dataset. Additionally, I would like to send the incoming tweet, > as well as a record from RankingResult (say with id = 1) to a Java UDF (from > within the SQL++ UDF) for more complex processing, like clustering the > tweets, and scoring them based on how relevant they are for a given topic. > The scoring within the Java UDF requires information about the record stored > in RankingResult. > > Applying the SQL++ UDF to a TwitterFeed, I aim to check whether a tweet is > scored higher than the tweets found in the RankingList record (containing the > top ranked tweets for the given topic). I see now that I could select the > record I wish to use by "SELECT VALUE r FROM RankingResult r where id=1". One > can think of the RankingResult dataset to hold one record per topic/user > query which I want to find the top k most relevant tweets for. > > The overall goal of the project is to see if AsterixDB can be used to > continuously rank tweets in real-time with respect to a user-defined query, > meaning that the RankingResult record for the given user query should be > updated continuously. I am however also looking into creating a TreeMap data > structure in the Java UDF to hold the top current tweets and their scores, > and use this for deciding whether the incoming tweet should switch place with > any of the top ranked tweets. However, I would like to update the > RankingResult record in order to make the data queryable. > > > Thanks in advance, > Sandra > > On 2019/04/25 22:10:56, Mike Carey <[email protected]> wrote: > > I will let someone else chime in on what the compilation error might be > > about, but approach 1 has the problem that you rightly tried to correct > > in approach 2 (because SELECT always returns an array of results). But > > - could you say a bit more - up 5000 feet - about the use case you are > > trying to address...? It's not clear (to me) why one might want to have > > a single-item dataset - perhaps that's just a part of your > > trying-to-make-this-work debugging - but it might help if the group > > could see what you are trying to do overall. (E.g., if you just want to > > process incoming records on a feed, you wouldn't need another dataset > > for that. What's the more general picture/desire?) > > > > Cheers, > > > > Mike > > > > On 4/25/19 12:08 AM, [email protected] wrote: > > > Hi devs! > > > > > > Given a datatype RankingResultType and a dataset > > > RankingResult(RankingResultType) which contains only one record, what is > > > the correct approach when I want to pass a single RankingResult record as > > > an argument to a Java UDF in a SQL++ UDF? The resulting record of the > > > Java UDF should be selected at the end of the UDF as it is going to be > > > stored in the dataset the feed which uses the SQL++ UDF is attached to. > > > > > > CREATE FUNCTION rank(newItem) { > > > LET rankingResult = *must select the record here*, > > > SELECT testlib#detectRelevance(newItem, *must pass RankingResult record > > > here*) > > > }; > > > > > > I have tried some different approaches, for instance > > > 1. running LET rankingResult = (SELECT VALUE r FROM RankingResult r) > > > SELECT testlib#detectRelevance(newItem, rankingResult) > > > 2. running LET rankingResult = (SELECT VALUE r FROM RankingResult r)[0] > > > SELECT testlib#detectRelevance(newItem, rankingResult) > > > > > > The first approach throws a TypeMismatchException, ASX1002: Type > > > mismatch: function testlib#detectRelevance expects its 2nd input > > > parameter to be of type object, but the actual input type is array > > > > > > So I therefore tried to access the first element of the array in the > > > second approach, but the second approach does not compile: > > > SX1079: Compilation error: The input type union(RankingResultType: closed > > > { > > > id: bigint, > > > first: RankingType: open { score: double }, > > > second: RankingType: open {score: double}, > > > third: RankingType: open { score: double}, > > > fourth: RankingType: open {score: double}, > > > fifth: RankingType: open {score: double} > > > } , null, missing) is not a valid record type! > > > > > > Could you maybe point me in the right direction? > > > Thanks in advance! > > > > > > Best, > > > Sandra > > > > > >
