Yes and yes, I believe! The AQL UDF case is less tested, I believe, but it should work... On Oct 29, 2015 12:22 PM, "Jianfeng Jia" <[email protected]> wrote:
> Hi Devs, > > I have two related questions, > 1. Is there any example code of using UDF in feed-adapter? > 2. Can we use AQL function in those kind of feed UDFs? > > Thank you. > > On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <[email protected]> > wrote: > > > Thanks! > > > > On 10/27/15 9:48 AM, Raman Grover wrote: > > > >> Hi, > >> > >> > >> In the case when data is being received from an external source (e.g. > >> during feed ingestion), a slow rate of arrival of data may result in > >> excessive delays until the data is deposited into the target dataset and > >> made accessible to queries. Data moves along a data ingestion pipeline > >> between operators as packed fixed size frames. The default behavior is > to > >> wait for the frame to be full before dispatching the contained data to > the > >> downstream operator. However, as noted, this may not suit all scenarios > >> particularly when data source is sending data at a low rate. To cater to > >> different scenarios, AsterixDB allows configuring the behavior. The > >> different options are described next. > >> > >> *Push data downstream when* > >> (a) Frame is full (default) > >> (b) At least N records (data items) have been collected into a partially > >> filled frame > >> (c) At least T seconds have elapsed since the last record was put into > >> the frame > >> > >> *How to configure the behavior?* > >> At the time of defining a feed, an end-user may specify configuration > >> parameters that determine the runtime behavior (options (a), (b) or (c) > >> from above). > >> > >> The parameters are described below: > >> > >> /"parser-policy"/: A specific strategy chosen from a set of pre-defined > >> values - > >> (i) / "frame_full"/ > >> This is the default value. As the name suggests, this choice causes > >> frames to be pushed by the feed adaptor only when there isn't sufficient > >> space for an additional record to fit in. This corresponds to option > (a). > >> > >> (ii) / "counter_timer_expired" / > >> Use this as the value if you wish to set either option (b) or (c) or a > >> combination of both. > >> > >> *Some Examples* > >> * > >> * > >> 1) Pack a maximum of 100 records into a data frame and push it > downstream. > >> > >> create feed my_feed using my_adaptor > >> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ... > >> other parameters); > >> > >> 2) Wait till 2 seconds and send however many records collected in a > frame > >> downstream. > >> create feed my_feed using my_adaptor > >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")... > >> other parameters); > >> > >> 3) Wait till 100 records have been collected into a data frame or 2 > >> seconds have elapsed since the last record was put into the current data > >> frame. > >> create feed my_feed using my_adaptor > >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"), > >> ("batch-size"="100"),... other parameters); > >> > >> > >> *Note* > >> The above config parameters are not specific to using a particular > >> implementation of an adaptor but are available for use with any feed > >> adaptor. Some adaptors that ship with AsterixDB use different default > >> values for above to suit their specific scenario. E.g. the pull-based > >> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and > >> sets the parameter "batch-interval". > >> > >> > >> Regards, > >> Raman > >> PS: The names of the parameters described above are not as intuitive as > >> one would like them to be. The names need to be changed. > >> > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <[email protected] <mailto: > >> [email protected]>> wrote: > >> > >> I think we need to have tuning parameters - like batch size and > >> maximum tolerable latency (in case there's a lull and you still > >> want to push stuff with some worst-case delay). @Raman Grover - > >> remind me (us) what's available in this regard? > >> > >> On 10/22/15 4:29 AM, Pääkkönen Pekka wrote: > >> > >>> > >>> Hi, > >>> > >>> Yes, you are right. I tried sending a larger amount of data, and > >>> data is now stored to the database. > >>> > >>> Does it make sense to configure a smaller batch size in order to > >>> get more frequent writes? > >>> > >>> Or would it significantly impact performance? > >>> > >>> -Pekka > >>> > >>> Data moves through the pipeline in frame-sized batches, so one > >>> > >>> (uniformed :-)) guess is that you aren't running very long, and > >>> you're > >>> > >>> only seeing the data flow when you close because only then do you > >>> have a > >>> > >>> batch's worth. Is that possible? You can test this by running > >>> longer > >>> > >>> (more data) and seeing if you start to see the expected incremental > >>> > >>> flow/inserts. (And we need tunability in this area, e.g., > >>> parameters on > >>> > >>> how much batching and/or low much latency to tolerate on each > feed.) > >>> > >>> On 10/21/15 4:45 AM, Pääkkönen Pekka wrote: > >>> > >>> > > >>> > >>> > Hi, > >>> > >>> > > >>> > >>> > Thanks, now I am able to create a socket feed, and save items to > >>> the > >>> > >>> > dataset from the feed. > >>> > >>> > > >>> > >>> > It seems that data items are written to the dataset after I close > >>> the > >>> > >>> > socket at the client. > >>> > >>> > > >>> > >>> > Is there some way to indicate to AsterixDB feed (with a newline > or > >>> > >>> > other indicator) that data can be written to the database, when > the > >>> > >>> > connection is open? > >>> > >>> > > >>> > >>> > After I close the socket at the client, the feed seems to close > >>> down. > >>> > >>> > Or is it only paused, until it is resumed? > >>> > >>> > > >>> > >>> > -Pekka > >>> > >>> > > >>> > >>> > Hi Pekka, > >>> > >>> > > >>> > >>> > That's interesting, I'm not sure why the CC would appear as being > >>> down > >>> > >>> > > >>> > >>> > to Managix. However if you can access the web console, it that > >>> > >>> > > >>> > >>> > evidently isn't the case. > >>> > >>> > > >>> > >>> > As for data ingestion via sockets, yes it is possible, but it > kind > >>> of > >>> > >>> > > >>> > >>> > depends on what's meant by sockets. There's no tutorial for it, > but > >>> > >>> > > >>> > >>> > take a look at SocketBasedFeedAdapter in the source, as well as > >>> > >>> > > >>> > >>> > > >>> > https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java > >>> > >>> > > >>> > >>> > for some examples of how it works. > >>> > >>> > > >>> > >>> > Hope that helps! > >>> > >>> > > >>> > >>> > Thanks, > >>> > >>> > > >>> > >>> > -Ian > >>> > >>> > > >>> > >>> > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka > >>> > >>> ><[email protected]> <mailto:[email protected]> wrote: > >>> > >>> > > Hi Ian, > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > Thanks for the reply. > >>> > >>> > > > >>> > >>> > > I compiled AsterixDB v0.87 and started it. > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > However, I get the following warnings: > >>> > >>> > > > >>> > >>> > > INFO: Name:my_asterix > >>> > >>> > > > >>> > >>> > > Created:Mon Oct 19 08:37:16 UTC 2015 > >>> > >>> > > > >>> > >>> > > Web-Url:http://192.168.101.144:19001 > >>> > >>> > > > >>> > >>> > > State:UNUSABLE > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > WARNING!:Cluster Controller not running at master > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > Also, I see the following warnings in my_asterixdb1.log. there > >>> are no > >>> > >>> > > warnings or errors in cc.log > >>> > >>> > > > >>> > >>> > > “ > >>> > >>> > > > >>> > >>> > > Oct 19, 2015 8:37:39 AM > >>> > >>> > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager > >>> configure > >>> > >>> > > > >>> > >>> > > SEVERE: LifecycleComponentManager configured > >>> > >>> > > > >>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47 > >>> > >>> > > > >>> > >>> > > .. > >>> > >>> > > > >>> > >>> > > INFO: Completed sharp checkpoint. > >>> > >>> > > > >>> > >>> > > Oct 19, 2015 8:37:40 AM > >>> org.apache.asterix.om.util.AsterixClusterProperties > >>> > >>> > > getIODevices > >>> > >>> > > > >>> > >>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 > >>> not found. The > >>> > >>> > > node has not joined yet or has left. > >>> > >>> > > > >>> > >>> > > Oct 19, 2015 8:37:40 AM > >>> org.apache.asterix.om.util.AsterixClusterProperties > >>> > >>> > > getIODevices > >>> > >>> > > > >>> > >>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 > >>> not found. The > >>> > >>> > > node has not joined yet or has left. > >>> > >>> > > > >>> > >>> > > Oct 19, 2015 8:38:38 AM > >>> > >>> > > org.apache.hyracks.control.common.dataset.ResultStateSweeper > >>> sweep > >>> > >>> > > > >>> > >>> > > INFO: Result state cleanup instance successfully completed.” > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > I seems that AsterixDB is running, and I can access it at port > >>> 19001. > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > The documentation shows ingestion of tweets, but I would be > >>> interested in > >>> > >>> > > using sockets. > >>> > >>> > > > >>> > >>> > > Is it possible to ingest data from sockets? > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > Regards, > >>> > >>> > > > >>> > >>> > > -Pekka > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > Hey there Pekka, > >>> > >>> > > > >>> > >>> > > Your intuition is correct, most of the newer feeds features are > >>> in the > >>> > >>> > > > >>> > >>> > > current master branch and not in the (very) old 0.8.6 release. > >>> If you'd > >>> > >>> > > > >>> > >>> > > like to experiment with them you'll have to build from source. > >>> The > >>> details > >>> > >>> > > > >>> > >>> > > about that are here: > >>> > >>> > > > >>> > >>> > > > >>> > https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse > >>> > >>> > > > >>> > >>> > > , but they're probably a bit overkill for just trying to get > the > >>> compiled > >>> > >>> > > > >>> > >>> > > binaries. For that all you really need to do is : > >>> > >>> > > > >>> > >>> > > - Clone Hyracks from git > >>> > >>> > > > >>> > >>> > > - 'mvn clean install -DskipTests' > >>> > >>> > > > >>> > >>> > > - Clone AsterixDB > >>> > >>> > > > >>> > >>> > > - 'mvn clean package -DskipTests' > >>> > >>> > > > >>> > >>> > > Then, the binaries will sit in asterix-installer/target > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > For an example, the documentation shows how to set up a feed > >>> that's > >>> > >>> > > > >>> > >>> > > ingesting Tweets: > >>> > >>> > > > >>> > >>> > > > >>> > https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > Thanks, > >>> > >>> > > > >>> > >>> > > -Ian > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka > >>> <[email protected]> <mailto:[email protected]> > >>> > >>> > > > >>> > >>> > > wrote: > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > > >>> > >>> > >> Hi, > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> I would like to experiment with a socket-based feed. > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> Can you point me to an example on how to utilize them? > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> Do I need to install 0.8.7-snapshot version of AsterixDB in > >>> order to > >>> > >>> > > > >>> > >>> > >> experiment with feeds? > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> Regards, > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > >> -Pekka Pääkkönen > >>> > >>> > > > >>> > >>> > >> > >>> > >>> > > > >>> > >>> > > > >>> > >>> > > >>> > >>> > >> > >> > >> > >> -- > >> Raman > >> > > > > > > > -- > > ----------------- > Best Regards > > Jianfeng Jia > Ph.D. Candidate of Computer Science > University of California, Irvine >
