Re: Feeds UDF

Mike Carey Wed, 09 Dec 2015 09:43:23 -0800

Hmmm.... I'm not sure where the Halloween problem is in this case - fora given record being ingested, it's not in the dataset yet, and won'tget to move furrher thru the pipeline to the point where it IS in thedata set until after the query evaluation is over, the result has beencomputed, and the new object (the one to be inserted) has beendetermined. At least that's how it should work. There should thus beno way for the ingestion pipeline query to see a record twice in aself-join scenario, because it won't be in play in the dataset yet (it'snot part of "self") - right? (Or is there a subtlety that I'm missing?)


Cheers,
Mike



On 12/9/15 6:59 AM, abdullah alamoudi wrote:

The only problem I see is the Halloween problem in case of a self join,
hence the need for materialization(not sure if it is possible in this case
but definitely possible in general). Other than that, I don't think there
is any problem.

Cheers,
Abdullah
On Dec 8, 2015 11:51 PM, "Mike Carey" <[email protected]> wrote:

(I am still completely not seeing a problem here.)

On 12/8/15 10:20 PM, abdullah alamoudi wrote:

The plan is to mostly use Upsert in the future since we can do some
optimizations with it that we can't do with an insert.
We should also support deletes as well and probably allow a mix of the
three operations within the same feed. This is a work in progress right
now
but before I go far, I am stabilizing some other parts of the feeds.

Cheers,
Abdullah.


Amoudi, Abdullah.

On Tue, Dec 8, 2015 at 10:11 PM, Ildar Absalyamov <
[email protected]> wrote:

Abdullah,

OK, now I see what problems it will cause.
Kinda related question: could the feed implement “upsert” semantics, that
you’ve been working on, instead of “insert” semantics?

On Dec 8, 2015, at 21:52, abdullah alamoudi <[email protected]> wrote:

I think that we probably should restrict feed applied functions somehow
(needs further thoughts and discussions) and I know for sure that we

don't.

As for the case you present, I would imagine that it could be allowed
theoretically but I think everyone sees why it should be disallowed.

One thing to keep in mind is that we introduce a materialize if the

dataset

was part of an insert pipeline. Now think about how this would work with

continuous feed. One choice would be that the feed will materialize all
records to be inserted and once the feed stops, it would start inserting
them but I still think we should not allow it.

My 2c,
Any opposing argument?


Amoudi, Abdullah.

On Tue, Dec 8, 2015 at 6:28 PM, Ildar Absalyamov <

[email protected]

wrote:

Hi All,

As a part of feed ingestion we do allow preprocessing incoming data
with
AQL UDFs.
I was wondering if we somehow restrict the kind of UDFs that could be
used? Do we allow joins in these UDFs? Especially joins with the same
dataset, which is used for intake. Ex:

create type TweetType as open {
   id: string,
   username : string,
   location : string,
   text : string,
   timestamp : string
}
create dataset Tweets(TweetType)
primary key id;
create function feed_processor($x) {
for $y in dataset Tweets
// self-join with Tweets dataset on some predicate($x, $y)
return $y
}
create feed TweetFeed
apply function feed_processor;

The query above fails in runtime, but I was wondering if that
theoretically could work at all.

Best regards,
Ildar


Best regards,

Ildar

Re: Feeds UDF

Reply via email to