Hi Sandra,

Just to clarify the Java UDF implementation, you *cannot* access AsterixDB
datasets in a Java UDF. To approximate what you want to do with a Java UDF,
you can load the reference data *files* in a Java UDF and update the file
externally at a high cost of reloading it from time to time. Both options
are discussed in the paper.

Best,
Xikui

On Thu, Feb 28, 2019 at 9:07 AM Xikui Wang <[email protected]> wrote:

> Hi Sandra,
>
> To answer your question in short: Yes, you can use Java UDF to do that.
>
> One thing worth noticing is that, whether you use a Java UDF or a SQL++
> UDF, there can be issues in some cases, as you are accessing dataset on a
> feed pipeline and that dataset is being actively fed by the other data
> feed. I recently submitted a paper that discussed a similar problem. There
> are some examples of using SQL++ UDFs or Java UDFs on a feed pipeline in
> the paper as well. I've attached the latest draft of that paper, and it's
> on arXiv as well [1] (the latest draft is under processing). Please have a
> look and let me know whether that helps.
>
> [1] https://arxiv.org/abs/1902.08271
>
> Best,
> Xikui
>
> On Thu, Feb 28, 2019 at 12:54 AM [email protected] <
> [email protected]> wrote:
>
>> Hi!
>>
>> I am trying to understand how to access data stored in a dataset, say the
>> dataset "UserQueries", from a UDF. Say the intent of the given UDF is
>> similar to the "WordsInList" UDF created here:
>> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java
>>
>> The possible pipeline of the system would look like this:
>> A socket feed is created and started, which listens to incoming data of
>> the type "UserQuery". I’ve created a user interface which will send data to
>> the specific socket in ADM format. This data is stored in the dataset
>> "UserQueries". Then, I wish to access the data in a given record within
>> "UserQueries" to find the keywords to use in the WordInList UDF. This
>> function/UDF is then going to be used as a query predicate to filter the
>> incoming data.
>>
>> Must the UDF be written in SQL++ format in order to achieve this, or is
>> it possible to write it in Java? The “Data Ingestion in AsterixDB” article
>> specifies that the former format is a good fit when the pre-processing of a
>> record requires the result of a query, and I can’t find any documentation
>> doing this with a Java UDF.
>>
>> If the UDF must be written in SQL++ in order to accomplish this, I am
>> thinking something like this:
>>
>> create function GetUserQueryKeywords(userId) {
>>     (select q.keywords from UserQueries q
>>        where q.userid = userid
>>        and q.timestamp > current_datetime() - daytime_duration(“PT10”))
>> };
>>
>> Could you maybe point me in the right direction of how to use such query
>> results as input for a UDF like  WordInList, if possible?
>>
>> Thanks in advance.
>>
>> Best regards,
>> Sandra
>>
>>

Reply via email to