Thanks -- I think I missed part of the point of logical sinks the first time I went through the doc. I was more interested in being able to dynamically add or remove nodes from a flow dynamically as they happened to come up or power down. I got the impression that was what logical nodes were intended for, but then fell short of accomplishing what was really needed so autoFailoverChains and flows were added in as a quick fix (I'm not familiar with the history, this is just the impression I get from the fact that flows aren't supported by the multiconfig syntax and the lack of support for both solutions if you're using a distributed master -- which is a shame -- after all, if you want complete fault tolerance, you need to use both a distributed master and autoFailoverChains). The approach I came up with didn't rely so much on logical nodes so I didn't spend as much time working with them.
https://github.com/cloudera/flume/blob/master/flume-core/src/main/antlr3/com/cloudera/flume/shell/antlr/FlumeShell.g https://github.com/cloudera/flume/blob/master/flume-core/src/main/antlr3/com/cloudera/flume/conf/FlumeDeploy.g I was just looking at the github repository (see links above) to see if any history that might explain when and why the inconsistency between the Identifier and Argument might have been introduced, but the repo only goes back to 2010 -- I'm not sure where to find older history. At the time the code was added, both terms were already defined as they currently are (with the colon being the only difference). I'm guessing part of the problem was introduced with the Identifier was overloaded to describe the syntax for both Java function names as well as Host ids. As is it allows the dot '.' which would be necessary for host names like hellofrom.somewhere.com, but which would lead to invalid java code if used in a function name. Conversely it allows the underscore which isn't entirely valid in standard conforming host names. Personally I think it would be a good idea to break out that one term so that functions and sink names have one set of valid characters and host names have a separate specification. The colon is kind of a tricky one especially when viewed in the situation where you want to use it -- first of, if tacked onto the end of a host name, I would normally think it was specifying a port. Secondly, based on the multiconfig syntax of 'HOST:SOURCE|SINK', it could get quite confusing to parse that line if the host name were allowed to contain a colon. As a result you'd need to quote it or escape it, but in the end it might be easiest and the least confusing just not to allow it. On Thu, Sep 15, 2011 at 6:03 PM, Huang, Zijian(Victor) <[email protected]> wrote: > Hi, Jeff: > you can look at here on the use of logical node: > http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_logical_nodes > We use it to have many nodes threads running on a single JVM, and each node > will stream one file to a different collector. The other approach is to start > more than one Java processes using the flume cmd. > > Look like I can't use colon, so for now I have to replaced by something > else, but I think the Flume team need to make the grammar more consistent > > Vic > > -----Original Message----- > From: Jeff Hansen [mailto:[email protected]] > Sent: Thursday, September 15, 2011 3:53 PM > To: [email protected] > Subject: Re: Restricted character in logical node name > > Oh, I see. I could be wrong, but I don't believe you can use logical node > names in the place of Hosts for configuration purposes. I believe they're > intended just for use with logicalSinks and logicalSources. > > Whether that's the case or not though, when it comes to specifying the host > name in your config or multiconfig, the antlr grammar files have the "host" > name using the Identifier syntax I included earlier -- so from that > perspective the colon is not allowed. > > On Thu, Sep 15, 2011 at 5:17 PM, Huang, Zijian(Victor) > <[email protected]> wrote: >> Hi, Jeff: >> Thanks for the detail explanation. I can map the logical node using the ":" >> in the name, but I have problem configuring it. I am using it this way: >> === >> Exec map xxxx collector-sit:ets:txn:ord:test.log >> submit multiconfig 'collector-sit:ets:txn:ord:test.log: collectorSource( >> 16006 ) | text("/tmp/test.log")' >> === >> >> Getting an syntax error when I trying to do multiconfig. I tried >> quoting the node name, but it doesn't seem to work. If they allow us >> to create an logical node with ":" in the name they should provide us >> a way to configure it as well. I will take a look at their grammar in >> the mean time >> >> >> Thanks >> >> Vic >> >> >> >> >> -----Original Message----- >> From: Jeff Hansen [mailto:[email protected]] >> Sent: Thursday, September 15, 2011 8:18 AM >> To: [email protected] >> Subject: Re: Restricted character in logical node name >> >> Are you by any chance using it in somewhere in a config or multiconfig >> without quoting it? >> >> specifically, if you were to say >> >> config host someSource logicalSink(bac:host:accee.log) >> >> the parser would treat the logical name as a function rather than a string >> literal and colons aren't allowed in function names. >> >> Functions use identifiers: >> Identifier >> : Letter (Letter|JavaIDDigit|'.'|'-'|'_')* >> ; >> >> However, when you're mapping the host to a logical name, config >> arguments are allowed to have colons Argument >> : (Letter|JavaIDDigit|':'|'.'|'-'|'_')+ >> ; >> >> So I assume you'd be fine with a line like exec map somehost >> bac:host:accee.log >> >> Without looking through the code I don't know if there are further >> constraints, but digging through the antlr syntax in FlumeShell.g and >> FlumeDeploy.g help me understand the config grammar a lot better. >> >> >> On Wed, Sep 14, 2011 at 6:56 PM, Huang, Zijian(Victor) >> <[email protected]> wrote: >>> Hi, Guys: >>> Is there a list of characters we can't not use in the logical >>> agent/collector's name. I tried "bac:host:accee.log", it seems Flume >>> has trouble dealing with ":" >>> >>> Thanks >>> >>> Victor Huang >>> >>> >>> >> >
