Hi Devs, I have two related questions, 1. Is there any example code of using UDF in feed-adapter? 2. Can we use AQL function in those kind of feed UDFs?
Thank you. On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <[email protected]> wrote: > Thanks! > > On 10/27/15 9:48 AM, Raman Grover wrote: > >> Hi, >> >> >> In the case when data is being received from an external source (e.g. >> during feed ingestion), a slow rate of arrival of data may result in >> excessive delays until the data is deposited into the target dataset and >> made accessible to queries. Data moves along a data ingestion pipeline >> between operators as packed fixed size frames. The default behavior is to >> wait for the frame to be full before dispatching the contained data to the >> downstream operator. However, as noted, this may not suit all scenarios >> particularly when data source is sending data at a low rate. To cater to >> different scenarios, AsterixDB allows configuring the behavior. The >> different options are described next. >> >> *Push data downstream when* >> (a) Frame is full (default) >> (b) At least N records (data items) have been collected into a partially >> filled frame >> (c) At least T seconds have elapsed since the last record was put into >> the frame >> >> *How to configure the behavior?* >> At the time of defining a feed, an end-user may specify configuration >> parameters that determine the runtime behavior (options (a), (b) or (c) >> from above). >> >> The parameters are described below: >> >> /"parser-policy"/: A specific strategy chosen from a set of pre-defined >> values - >> (i) / "frame_full"/ >> This is the default value. As the name suggests, this choice causes >> frames to be pushed by the feed adaptor only when there isn't sufficient >> space for an additional record to fit in. This corresponds to option (a). >> >> (ii) / "counter_timer_expired" / >> Use this as the value if you wish to set either option (b) or (c) or a >> combination of both. >> >> *Some Examples* >> * >> * >> 1) Pack a maximum of 100 records into a data frame and push it downstream. >> >> create feed my_feed using my_adaptor >> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ... >> other parameters); >> >> 2) Wait till 2 seconds and send however many records collected in a frame >> downstream. >> create feed my_feed using my_adaptor >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")... >> other parameters); >> >> 3) Wait till 100 records have been collected into a data frame or 2 >> seconds have elapsed since the last record was put into the current data >> frame. >> create feed my_feed using my_adaptor >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"), >> ("batch-size"="100"),... other parameters); >> >> >> *Note* >> The above config parameters are not specific to using a particular >> implementation of an adaptor but are available for use with any feed >> adaptor. Some adaptors that ship with AsterixDB use different default >> values for above to suit their specific scenario. E.g. the pull-based >> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and >> sets the parameter "batch-interval". >> >> >> Regards, >> Raman >> PS: The names of the parameters described above are not as intuitive as >> one would like them to be. The names need to be changed. >> >> >> >> >> >> >> >> >> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <[email protected] <mailto: >> [email protected]>> wrote: >> >> I think we need to have tuning parameters - like batch size and >> maximum tolerable latency (in case there's a lull and you still >> want to push stuff with some worst-case delay). @Raman Grover - >> remind me (us) what's available in this regard? >> >> On 10/22/15 4:29 AM, Pääkkönen Pekka wrote: >> >>> >>> Hi, >>> >>> Yes, you are right. I tried sending a larger amount of data, and >>> data is now stored to the database. >>> >>> Does it make sense to configure a smaller batch size in order to >>> get more frequent writes? >>> >>> Or would it significantly impact performance? >>> >>> -Pekka >>> >>> Data moves through the pipeline in frame-sized batches, so one >>> >>> (uniformed :-)) guess is that you aren't running very long, and >>> you're >>> >>> only seeing the data flow when you close because only then do you >>> have a >>> >>> batch's worth. Is that possible? You can test this by running >>> longer >>> >>> (more data) and seeing if you start to see the expected incremental >>> >>> flow/inserts. (And we need tunability in this area, e.g., >>> parameters on >>> >>> how much batching and/or low much latency to tolerate on each feed.) >>> >>> On 10/21/15 4:45 AM, Pääkkönen Pekka wrote: >>> >>> > >>> >>> > Hi, >>> >>> > >>> >>> > Thanks, now I am able to create a socket feed, and save items to >>> the >>> >>> > dataset from the feed. >>> >>> > >>> >>> > It seems that data items are written to the dataset after I close >>> the >>> >>> > socket at the client. >>> >>> > >>> >>> > Is there some way to indicate to AsterixDB feed (with a newline or >>> >>> > other indicator) that data can be written to the database, when the >>> >>> > connection is open? >>> >>> > >>> >>> > After I close the socket at the client, the feed seems to close >>> down. >>> >>> > Or is it only paused, until it is resumed? >>> >>> > >>> >>> > -Pekka >>> >>> > >>> >>> > Hi Pekka, >>> >>> > >>> >>> > That's interesting, I'm not sure why the CC would appear as being >>> down >>> >>> > >>> >>> > to Managix. However if you can access the web console, it that >>> >>> > >>> >>> > evidently isn't the case. >>> >>> > >>> >>> > As for data ingestion via sockets, yes it is possible, but it kind >>> of >>> >>> > >>> >>> > depends on what's meant by sockets. There's no tutorial for it, but >>> >>> > >>> >>> > take a look at SocketBasedFeedAdapter in the source, as well as >>> >>> > >>> >>> > >>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java >>> >>> > >>> >>> > for some examples of how it works. >>> >>> > >>> >>> > Hope that helps! >>> >>> > >>> >>> > Thanks, >>> >>> > >>> >>> > -Ian >>> >>> > >>> >>> > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka >>> >>> ><[email protected]> <mailto:[email protected]> wrote: >>> >>> > > Hi Ian, >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > Thanks for the reply. >>> >>> > > >>> >>> > > I compiled AsterixDB v0.87 and started it. >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > However, I get the following warnings: >>> >>> > > >>> >>> > > INFO: Name:my_asterix >>> >>> > > >>> >>> > > Created:Mon Oct 19 08:37:16 UTC 2015 >>> >>> > > >>> >>> > > Web-Url:http://192.168.101.144:19001 >>> >>> > > >>> >>> > > State:UNUSABLE >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > WARNING!:Cluster Controller not running at master >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > Also, I see the following warnings in my_asterixdb1.log. there >>> are no >>> >>> > > warnings or errors in cc.log >>> >>> > > >>> >>> > > “ >>> >>> > > >>> >>> > > Oct 19, 2015 8:37:39 AM >>> >>> > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager >>> configure >>> >>> > > >>> >>> > > SEVERE: LifecycleComponentManager configured >>> >>> > > >>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47 >>> >>> > > >>> >>> > > .. >>> >>> > > >>> >>> > > INFO: Completed sharp checkpoint. >>> >>> > > >>> >>> > > Oct 19, 2015 8:37:40 AM >>> org.apache.asterix.om.util.AsterixClusterProperties >>> >>> > > getIODevices >>> >>> > > >>> >>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 >>> not found. The >>> >>> > > node has not joined yet or has left. >>> >>> > > >>> >>> > > Oct 19, 2015 8:37:40 AM >>> org.apache.asterix.om.util.AsterixClusterProperties >>> >>> > > getIODevices >>> >>> > > >>> >>> > > WARNING: Configuration parameters for nodeId my_asterix_node1 >>> not found. The >>> >>> > > node has not joined yet or has left. >>> >>> > > >>> >>> > > Oct 19, 2015 8:38:38 AM >>> >>> > > org.apache.hyracks.control.common.dataset.ResultStateSweeper >>> sweep >>> >>> > > >>> >>> > > INFO: Result state cleanup instance successfully completed.” >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > I seems that AsterixDB is running, and I can access it at port >>> 19001. >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > The documentation shows ingestion of tweets, but I would be >>> interested in >>> >>> > > using sockets. >>> >>> > > >>> >>> > > Is it possible to ingest data from sockets? >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > Regards, >>> >>> > > >>> >>> > > -Pekka >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > Hey there Pekka, >>> >>> > > >>> >>> > > Your intuition is correct, most of the newer feeds features are >>> in the >>> >>> > > >>> >>> > > current master branch and not in the (very) old 0.8.6 release. >>> If you'd >>> >>> > > >>> >>> > > like to experiment with them you'll have to build from source. >>> The >>> details >>> >>> > > >>> >>> > > about that are here: >>> >>> > > >>> >>> > > >>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse >>> >>> > > >>> >>> > > , but they're probably a bit overkill for just trying to get the >>> compiled >>> >>> > > >>> >>> > > binaries. For that all you really need to do is : >>> >>> > > >>> >>> > > - Clone Hyracks from git >>> >>> > > >>> >>> > > - 'mvn clean install -DskipTests' >>> >>> > > >>> >>> > > - Clone AsterixDB >>> >>> > > >>> >>> > > - 'mvn clean package -DskipTests' >>> >>> > > >>> >>> > > Then, the binaries will sit in asterix-installer/target >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > For an example, the documentation shows how to set up a feed >>> that's >>> >>> > > >>> >>> > > ingesting Tweets: >>> >>> > > >>> >>> > > >>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > Thanks, >>> >>> > > >>> >>> > > -Ian >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka >>> <[email protected]> <mailto:[email protected]> >>> >>> > > >>> >>> > > wrote: >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > >> Hi, >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> I would like to experiment with a socket-based feed. >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> Can you point me to an example on how to utilize them? >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> Do I need to install 0.8.7-snapshot version of AsterixDB in >>> order to >>> >>> > > >>> >>> > >> experiment with feeds? >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> Regards, >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > >> -Pekka Pääkkönen >>> >>> > > >>> >>> > >> >>> >>> > > >>> >>> > > >>> >>> > >>> >>> >> >> >> >> -- >> Raman >> > > -- ----------------- Best Regards Jianfeng Jia Ph.D. Candidate of Computer Science University of California, Irvine
