Hi Sandra, Just to clarify the Java UDF implementation, you *cannot* access AsterixDB datasets in a Java UDF. To approximate what you want to do with a Java UDF, you can load the reference data *files* in a Java UDF and update the file externally at a high cost of reloading it from time to time. Both options are discussed in the paper.
Best, Xikui On Thu, Feb 28, 2019 at 9:07 AM Xikui Wang <[email protected]> wrote: > Hi Sandra, > > To answer your question in short: Yes, you can use Java UDF to do that. > > One thing worth noticing is that, whether you use a Java UDF or a SQL++ > UDF, there can be issues in some cases, as you are accessing dataset on a > feed pipeline and that dataset is being actively fed by the other data > feed. I recently submitted a paper that discussed a similar problem. There > are some examples of using SQL++ UDFs or Java UDFs on a feed pipeline in > the paper as well. I've attached the latest draft of that paper, and it's > on arXiv as well [1] (the latest draft is under processing). Please have a > look and let me know whether that helps. > > [1] https://arxiv.org/abs/1902.08271 > > Best, > Xikui > > On Thu, Feb 28, 2019 at 12:54 AM [email protected] < > [email protected]> wrote: > >> Hi! >> >> I am trying to understand how to access data stored in a dataset, say the >> dataset "UserQueries", from a UDF. Say the intent of the given UDF is >> similar to the "WordsInList" UDF created here: >> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java >> >> The possible pipeline of the system would look like this: >> A socket feed is created and started, which listens to incoming data of >> the type "UserQuery". I’ve created a user interface which will send data to >> the specific socket in ADM format. This data is stored in the dataset >> "UserQueries". Then, I wish to access the data in a given record within >> "UserQueries" to find the keywords to use in the WordInList UDF. This >> function/UDF is then going to be used as a query predicate to filter the >> incoming data. >> >> Must the UDF be written in SQL++ format in order to achieve this, or is >> it possible to write it in Java? The “Data Ingestion in AsterixDB” article >> specifies that the former format is a good fit when the pre-processing of a >> record requires the result of a query, and I can’t find any documentation >> doing this with a Java UDF. >> >> If the UDF must be written in SQL++ in order to accomplish this, I am >> thinking something like this: >> >> create function GetUserQueryKeywords(userId) { >> (select q.keywords from UserQueries q >> where q.userid = userid >> and q.timestamp > current_datetime() - daytime_duration(“PT10”)) >> }; >> >> Could you maybe point me in the right direction of how to use such query >> results as input for a UDF like WordInList, if possible? >> >> Thanks in advance. >> >> Best regards, >> Sandra >> >>
