Hi Sandra, To answer your question in short: Yes, you can use Java UDF to do that.
One thing worth noticing is that, whether you use a Java UDF or a SQL++ UDF, there can be issues in some cases, as you are accessing dataset on a feed pipeline and that dataset is being actively fed by the other data feed. I recently submitted a paper that discussed a similar problem. There are some examples of using SQL++ UDFs or Java UDFs on a feed pipeline in the paper as well. I've attached the latest draft of that paper, and it's on arXiv as well [1] (the latest draft is under processing). Please have a look and let me know whether that helps. [1] https://arxiv.org/abs/1902.08271 Best, Xikui On Thu, Feb 28, 2019 at 12:54 AM [email protected] < [email protected]> wrote: > Hi! > > I am trying to understand how to access data stored in a dataset, say the > dataset "UserQueries", from a UDF. Say the intent of the given UDF is > similar to the "WordsInList" UDF created here: > https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java > > The possible pipeline of the system would look like this: > A socket feed is created and started, which listens to incoming data of > the type "UserQuery". I’ve created a user interface which will send data to > the specific socket in ADM format. This data is stored in the dataset > "UserQueries". Then, I wish to access the data in a given record within > "UserQueries" to find the keywords to use in the WordInList UDF. This > function/UDF is then going to be used as a query predicate to filter the > incoming data. > > Must the UDF be written in SQL++ format in order to achieve this, or is it > possible to write it in Java? The “Data Ingestion in AsterixDB” article > specifies that the former format is a good fit when the pre-processing of a > record requires the result of a query, and I can’t find any documentation > doing this with a Java UDF. > > If the UDF must be written in SQL++ in order to accomplish this, I am > thinking something like this: > > create function GetUserQueryKeywords(userId) { > (select q.keywords from UserQueries q > where q.userid = userid > and q.timestamp > current_datetime() - daytime_duration(“PT10”)) > }; > > Could you maybe point me in the right direction of how to use such query > results as input for a UDF like WordInList, if possible? > > Thanks in advance. > > Best regards, > Sandra > >
