Hi Xikui, thank you for your response, and for making things clear! :-)

I made the query predicate work! However, it was just when applying the 
testlib#filter_func() (wordDetector), not the testlib#process_func. When trying 
the latter, I got an error saying "ASX1074: Cannot resolve ambiguous alias 
reference for identifier text (in line 1, at column 22) [CompilationException]" 
when executing the query in [2]. I noticed that in the example provided by you 
earlier, the connect statement (like in [2]), is connecting the feed to a 
dataset of the same type as the input type. However, I have connected the feed 
adapter to RelevantDataset, which holds the data type created when the UDF have 
processed the incoming TweetType (So UDF takes TweetType as input, put outputs 
data on the format RelevantTweetType). Some context of how my datasets and data 
types look like are shown below.

First, I execute query [1], creating the dataverse, data types and the data 
sets, then query [2] is run.

[1]
CREATE DATAVERSE relevance; 
USE relevance; 
CREATE TYPE TweetType AS OPEN { 
  id: int32, text: string, 
  threadid: int32, 
  relevant: boolean }; 
CREATE TYPE TweetRelevantType AS CLOSED { 
  threadid: int32, 
  relevant: boolean, 
  id: int32, 
  tweet: string }; 
CREATE DATASET RelevantDataset(TweetRelevantType) PRIMARY KEY id; 
CREATE feed TestSocketFeed WITH { 
  "adapter-name": "socket_adapter", 
  "sockets": "127.0.0.1:10002", 
  "address-type": "IP", 
  "type-name": "TweetType", 
  "format": "adm" 
}

Initially, the field called "tweet" inside TweetRelevantType was called "text", 
but I tried to change it to "tweet" due to the error that occurred. It did not 
make any difference. 

Then, this query is executed, and this is where the error occurs, stating that 
"text" (input to wordDetector) is ambiguous:

[2]
CONNECT FEED TestSocketFeed TO DATASET RelevantDataset 
APPLY function testlib#detectRelevance WHERE testlib#wordDetector(text) = TRUE; 
START FEED TestSocketFeed;

I have tried to make it work, but can not seem to find the solution. Do you see 
something I am doing wrong, maybe related to how I connect the feed to the 
dataset when the UDF requires the output to be in another format than the 
input? Thanks in advance!

Best,
Sandra

On 2018/12/07 01:14:50, Xikui Wang <[email protected]> wrote: 
> Hi Sandra,
> 
> Yes. It will store the entire record. Note that the applying function to a
> feed is different from adding a filter to a feed. To help you understand
> their difference better, here is an example.
> 
> Imagine the data feed as a big dataset called FeedDataset, and you want to
> store the ingested data into the TargetDataset. An equivalent statement
> that moves data from the feed to the target dataset looks like this:
> 
> insert into TargetDataset(select value f from FeedDataset f);
> 
> If you apply a function called "testlib#process_func" on to the feed, the
> equivalent statement is like this:
> 
> insert into TargetDataset(select value testlib#process_func(f) from
> FeedDataset f);
> 
> If you have a filter function called "testlib#filter_func", and you add it
> to the feed using the WHERE clause, the equivalent statement becomes this:
> 
> insert into TargetDataset(select value testlib#process_func(f) from
> FeedDataset f where testlib#filter_func(f) == TRUE);
> 
> Thus, the filter function and the applied function are two things and they
> are orthogonal. In the last example, some incoming data are filtered out by
> the function (filter_func) in the where clause, and the remained incoming
> data will still be processed by the applied function (process_func). You
> can use either one that fits your needs. :)
> 
> Best,
> Xikui
> 
> On Thu, Dec 6, 2018 at 8:21 AM [email protected] <
> [email protected]> wrote:
> 
> > Hi again!
> >
> > I am currently trying to make use of the filtering by query predicate
> > example which was discussed in another thread here ("Build UDF project"),
> > see below:
> >
> > *connect feed UserFeed to dataset EmpDataset WHERE
> > testlib#wordDetector(fname) = TRUE;*
> > start feed UserFeed;
> >
> > . using the wordDetector UDF found here:
> > https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/WordInListFunction.java
> >
> > However, the output type of this UDF, as defined in library_descriptor.xml
> > is "ABOOLEAN". Will it still store the entire record (InputRecordType) in
> > the EmpDataset, or only the boolean value? And, if I would like to use the
> > records which pass the filtering in wordDetector as input to another UDF,
> > would I need to change the output type of the UDF? If so, the check
> > "testlib#wordDetector(fname) = TRUE;*" will not work anymore, due to the
> > output being an entire record instead of only a boolean.
> >
> > I appreciate your help!
> >
> > Best regards,
> > Sandra
> >
> >
> >
> 

Reply via email to