Hi Sandra, Change your CONNECT FEED statement to something like this: CONNECT FEED TestSocketFeed TO DATASET RelevantDataset APPLY function testlib#detectRelevance WHERE testlib#wordDetector(TestSocketFeed.text) = TRUE;
When your CONNECT FEED statement uses APPLY FUNCTION, then qualify every field you want to refer to with the feed name. Thanks Ali On Sun, Dec 9, 2018 at 11:26 AM [email protected] < [email protected]> wrote: > Hi Xikui, thank you for your response, and for making things clear! :-) > > I made the query predicate work! However, it was just when applying the > testlib#filter_func() (wordDetector), not the testlib#process_func. When > trying the latter, I got an error saying "ASX1074: Cannot resolve ambiguous > alias reference for identifier text (in line 1, at column 22) > [CompilationException]" when executing the query in [2]. I noticed that in > the example provided by you earlier, the connect statement (like in [2]), > is connecting the feed to a dataset of the same type as the input type. > However, I have connected the feed adapter to RelevantDataset, which holds > the data type created when the UDF have processed the incoming TweetType > (So UDF takes TweetType as input, put outputs data on the format > RelevantTweetType). Some context of how my datasets and data types look > like are shown below. > > First, I execute query [1], creating the dataverse, data types and the > data sets, then query [2] is run. > > [1] > CREATE DATAVERSE relevance; > USE relevance; > CREATE TYPE TweetType AS OPEN { > id: int32, text: string, > threadid: int32, > relevant: boolean }; > CREATE TYPE TweetRelevantType AS CLOSED { > threadid: int32, > relevant: boolean, > id: int32, > tweet: string }; > CREATE DATASET RelevantDataset(TweetRelevantType) PRIMARY KEY id; > CREATE feed TestSocketFeed WITH { > "adapter-name": "socket_adapter", > "sockets": "127.0.0.1:10002", > "address-type": "IP", > "type-name": "TweetType", > "format": "adm" > } > > Initially, the field called "tweet" inside TweetRelevantType was called > "text", but I tried to change it to "tweet" due to the error that occurred. > It did not make any difference. > > Then, this query is executed, and this is where the error occurs, stating > that "text" (input to wordDetector) is ambiguous: > > [2] > CONNECT FEED TestSocketFeed TO DATASET RelevantDataset > APPLY function testlib#detectRelevance WHERE testlib#wordDetector(text) = > TRUE; > START FEED TestSocketFeed; > > I have tried to make it work, but can not seem to find the solution. Do > you see something I am doing wrong, maybe related to how I connect the feed > to the dataset when the UDF requires the output to be in another format > than the input? Thanks in advance! > > Best, > Sandra > > On 2018/12/07 01:14:50, Xikui Wang <[email protected]> wrote: > > Hi Sandra, > > > > Yes. It will store the entire record. Note that the applying function to > a > > feed is different from adding a filter to a feed. To help you understand > > their difference better, here is an example. > > > > Imagine the data feed as a big dataset called FeedDataset, and you want > to > > store the ingested data into the TargetDataset. An equivalent statement > > that moves data from the feed to the target dataset looks like this: > > > > insert into TargetDataset(select value f from FeedDataset f); > > > > If you apply a function called "testlib#process_func" on to the feed, the > > equivalent statement is like this: > > > > insert into TargetDataset(select value testlib#process_func(f) from > > FeedDataset f); > > > > If you have a filter function called "testlib#filter_func", and you add > it > > to the feed using the WHERE clause, the equivalent statement becomes > this: > > > > insert into TargetDataset(select value testlib#process_func(f) from > > FeedDataset f where testlib#filter_func(f) == TRUE); > > > > Thus, the filter function and the applied function are two things and > they > > are orthogonal. In the last example, some incoming data are filtered out > by > > the function (filter_func) in the where clause, and the remained incoming > > data will still be processed by the applied function (process_func). You > > can use either one that fits your needs. :) > > > > Best, > > Xikui > > > > On Thu, Dec 6, 2018 at 8:21 AM [email protected] < > > [email protected]> wrote: > > > > > Hi again! > > > > > > I am currently trying to make use of the filtering by query predicate > > > example which was discussed in another thread here ("Build UDF > project"), > > > see below: > > > > > > *connect feed UserFeed to dataset EmpDataset WHERE > > > testlib#wordDetector(fname) = TRUE;* > > > start feed UserFeed; > > > > > > . using the wordDetector UDF found here: > > > > https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java > > > > > > However, the output type of this UDF, as defined in > library_descriptor.xml > > > is "ABOOLEAN". Will it still store the entire record (InputRecordType) > in > > > the EmpDataset, or only the boolean value? And, if I would like to use > the > > > records which pass the filtering in wordDetector as input to another > UDF, > > > would I need to change the output type of the UDF? If so, the check > > > "testlib#wordDetector(fname) = TRUE;*" will not work anymore, due to > the > > > output being an entire record instead of only a boolean. > > > > > > I appreciate your help! > > > > > > Best regards, > > > Sandra > > > > > > > > > > > > -- Regards,
