Approach number two seems right, ie, synchronize the steps so that the input is ready first....! On Sep 30, 2015 2:39 PM, "Steven Jacobs" <[email protected]> wrote:
> I think the problem with doing a single job (as mentioned) is that the > intake job will exist for many connection jobs, meaning that there is a > single intake job for feed, and a connection job for each connection to a > dataset. > Steven > > On Wed, Sep 30, 2015 at 12:05 PM, abdullah alamoudi <[email protected]> > wrote: > > > So I might have an idea about what could cause this. > > Following are some information about how feeds work. Please, correct me > if > > I am wrong as I am just starting to dive deep into this. > > > > -- Creating and Dropping feeds are just Metadata operations. > > -- When you connect a primary feed to a dataset, this is what happens: > > 1. Feed event subscriber is created for the feed and registered with feed > > lifecycle listener(Singleton running on master). > > 2. A feed intake job is constructed that consists of just the feed intake > > operator and a sink operator. When this job starts, it sits in memory > doing > > nothing because it has no subscribers yet. > > 3. Once the job [2] is submitted, the listener in [1] gets notified and > > construct an adm command that creates a Hyracks job which has a feed > > collect operator that gets records from the running intake job[2] and > feeds > > it into the dataset. > > 4. There is no synchronization between [2] and [3] and there is a chance > > that [3] starts before [2] is ready and that it doesn't find the intake > > runtime and throws an exception. I know the chance is slim but it is > there > > (It has happened to me). > > 5. At that time, the intake job will never return since it is just > setting > > in memory. > > > > I am not sure about this but I am guessing that the larger the cluster, > the > > higher the chance that one runs into this. > > > > The question I have is: Since at the connect statement, we already know > > everything about the dataset that will be fed into by the feed, why don't > > we construct a single job that has two roots (the sink and the commit)? > > Another option would be to make sure that the intake is ready in all > nodes > > before the subscription is submitted. > > > > Does any of this make sense? > > > > > > Amoudi, Abdullah. > > > > On Mon, Sep 14, 2015 at 8:23 PM, Till Westmann (JIRA) <[email protected]> > > wrote: > > > > > > > > [ > > > > > > https://issues.apache.org/jira/browse/ASTERIXDB-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > > > ] > > > > > > Till Westmann updated ASTERIXDB-1085: > > > ------------------------------------- > > > Assignee: Abdullah Alamoudi > > > > > > > Sporadic failures in Feed related tests > > > > --------------------------------------- > > > > > > > > Key: ASTERIXDB-1085 > > > > URL: > > > https://issues.apache.org/jira/browse/ASTERIXDB-1085 > > > > Project: Apache AsterixDB > > > > Issue Type: Bug > > > > Components: AsterixDB, Feeds > > > > Reporter: Abdullah Alamoudi > > > > Assignee: Abdullah Alamoudi > > > > > > > > Sporadically, test cases which use Feeds (Not necessarily in the feed > > > test group) fail. There are no exception thrown but records which are > > > supposed to be in the dataset are not. and subsequent queries return > > empty > > > results. > > > > > > > > > > > > -- > > > This message was sent by Atlassian JIRA > > > (v6.3.4#6332) > > > > > >
