Hi Sandra,

To answer your question in short: Yes, you can use Java UDF to do that.

One thing worth noticing is that, whether you use a Java UDF or a SQL++
UDF, there can be issues in some cases, as you are accessing dataset on a
feed pipeline and that dataset is being actively fed by the other data
feed. I recently submitted a paper that discussed a similar problem. There
are some examples of using SQL++ UDFs or Java UDFs on a feed pipeline in
the paper as well. I've attached the latest draft of that paper, and it's
on arXiv as well [1] (the latest draft is under processing). Please have a
look and let me know whether that helps.

[1] https://arxiv.org/abs/1902.08271

Best,
Xikui

On Thu, Feb 28, 2019 at 12:54 AM [email protected] <
[email protected]> wrote:

> Hi!
>
> I am trying to understand how to access data stored in a dataset, say the
> dataset "UserQueries", from a UDF. Say the intent of the given UDF is
> similar to the "WordsInList" UDF created here:
> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java
>
> The possible pipeline of the system would look like this:
> A socket feed is created and started, which listens to incoming data of
> the type "UserQuery". I’ve created a user interface which will send data to
> the specific socket in ADM format. This data is stored in the dataset
> "UserQueries". Then, I wish to access the data in a given record within
> "UserQueries" to find the keywords to use in the WordInList UDF. This
> function/UDF is then going to be used as a query predicate to filter the
> incoming data.
>
> Must the UDF be written in SQL++ format in order to achieve this, or is it
> possible to write it in Java? The “Data Ingestion in AsterixDB” article
> specifies that the former format is a good fit when the pre-processing of a
> record requires the result of a query, and I can’t find any documentation
> doing this with a Java UDF.
>
> If the UDF must be written in SQL++ in order to accomplish this, I am
> thinking something like this:
>
> create function GetUserQueryKeywords(userId) {
>     (select q.keywords from UserQueries q
>        where q.userid = userid
>        and q.timestamp > current_datetime() - daytime_duration(“PT10”))
> };
>
> Could you maybe point me in the right direction of how to use such query
> results as input for a UDF like  WordInList, if possible?
>
> Thanks in advance.
>
> Best regards,
> Sandra
>
>

Reply via email to