Hi Sandra,

Change your CONNECT FEED statement to something like this:
CONNECT FEED TestSocketFeed TO DATASET RelevantDataset
APPLY function testlib#detectRelevance WHERE
testlib#wordDetector(TestSocketFeed.text) = TRUE;

When your CONNECT FEED statement uses APPLY FUNCTION, then qualify every
field you want to refer to with the feed name.

Thanks
Ali

On Sun, Dec 9, 2018 at 11:26 AM [email protected] <
[email protected]> wrote:

> Hi Xikui, thank you for your response, and for making things clear! :-)
>
> I made the query predicate work! However, it was just when applying the
> testlib#filter_func() (wordDetector), not the testlib#process_func. When
> trying the latter, I got an error saying "ASX1074: Cannot resolve ambiguous
> alias reference for identifier text (in line 1, at column 22)
> [CompilationException]" when executing the query in [2]. I noticed that in
> the example provided by you earlier, the connect statement (like in [2]),
> is connecting the feed to a dataset of the same type as the input type.
> However, I have connected the feed adapter to RelevantDataset, which holds
> the data type created when the UDF have processed the incoming TweetType
> (So UDF takes TweetType as input, put outputs data on the format
> RelevantTweetType). Some context of how my datasets and data types look
> like are shown below.
>
> First, I execute query [1], creating the dataverse, data types and the
> data sets, then query [2] is run.
>
> [1]
> CREATE DATAVERSE relevance;
> USE relevance;
> CREATE TYPE TweetType AS OPEN {
>   id: int32, text: string,
>   threadid: int32,
>   relevant: boolean };
> CREATE TYPE TweetRelevantType AS CLOSED {
>   threadid: int32,
>   relevant: boolean,
>   id: int32,
>   tweet: string };
> CREATE DATASET RelevantDataset(TweetRelevantType) PRIMARY KEY id;
> CREATE feed TestSocketFeed WITH {
>   "adapter-name": "socket_adapter",
>   "sockets": "127.0.0.1:10002",
>   "address-type": "IP",
>   "type-name": "TweetType",
>   "format": "adm"
> }
>
> Initially, the field called "tweet" inside TweetRelevantType was called
> "text", but I tried to change it to "tweet" due to the error that occurred.
> It did not make any difference.
>
> Then, this query is executed, and this is where the error occurs, stating
> that "text" (input to wordDetector) is ambiguous:
>
> [2]
> CONNECT FEED TestSocketFeed TO DATASET RelevantDataset
> APPLY function testlib#detectRelevance WHERE testlib#wordDetector(text) =
> TRUE;
> START FEED TestSocketFeed;
>
> I have tried to make it work, but can not seem to find the solution. Do
> you see something I am doing wrong, maybe related to how I connect the feed
> to the dataset when the UDF requires the output to be in another format
> than the input? Thanks in advance!
>
> Best,
> Sandra
>
> On 2018/12/07 01:14:50, Xikui Wang <[email protected]> wrote:
> > Hi Sandra,
> >
> > Yes. It will store the entire record. Note that the applying function to
> a
> > feed is different from adding a filter to a feed. To help you understand
> > their difference better, here is an example.
> >
> > Imagine the data feed as a big dataset called FeedDataset, and you want
> to
> > store the ingested data into the TargetDataset. An equivalent statement
> > that moves data from the feed to the target dataset looks like this:
> >
> > insert into TargetDataset(select value f from FeedDataset f);
> >
> > If you apply a function called "testlib#process_func" on to the feed, the
> > equivalent statement is like this:
> >
> > insert into TargetDataset(select value testlib#process_func(f) from
> > FeedDataset f);
> >
> > If you have a filter function called "testlib#filter_func", and you add
> it
> > to the feed using the WHERE clause, the equivalent statement becomes
> this:
> >
> > insert into TargetDataset(select value testlib#process_func(f) from
> > FeedDataset f where testlib#filter_func(f) == TRUE);
> >
> > Thus, the filter function and the applied function are two things and
> they
> > are orthogonal. In the last example, some incoming data are filtered out
> by
> > the function (filter_func) in the where clause, and the remained incoming
> > data will still be processed by the applied function (process_func). You
> > can use either one that fits your needs. :)
> >
> > Best,
> > Xikui
> >
> > On Thu, Dec 6, 2018 at 8:21 AM [email protected] <
> > [email protected]> wrote:
> >
> > > Hi again!
> > >
> > > I am currently trying to make use of the filtering by query predicate
> > > example which was discussed in another thread here ("Build UDF
> project"),
> > > see below:
> > >
> > > *connect feed UserFeed to dataset EmpDataset WHERE
> > > testlib#wordDetector(fname) = TRUE;*
> > > start feed UserFeed;
> > >
> > > . using the wordDetector UDF found here:
> > >
> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java
> > >
> > > However, the output type of this UDF, as defined in
> library_descriptor.xml
> > > is "ABOOLEAN". Will it still store the entire record (InputRecordType)
> in
> > > the EmpDataset, or only the boolean value? And, if I would like to use
> the
> > > records which pass the filtering in wordDetector as input to another
> UDF,
> > > would I need to change the output type of the UDF? If so, the check
> > > "testlib#wordDetector(fname) = TRUE;*" will not work anymore, due to
> the
> > > output being an entire record instead of only a boolean.
> > >
> > > I appreciate your help!
> > >
> > > Best regards,
> > > Sandra
> > >
> > >
> > >
> >
>


-- 
Regards,

Reply via email to