On 1 Jun 2015, at 13:47, Jean-Baptiste Note 
<jbn...@gmail.com<mailto:jbn...@gmail.com>> wrote:

Hi there,

I've successfully exported some host/port dynamic combination in slider for
Kafka on Yarn; they are made available under
publisher/exports/servers on the appmaster (see
https://github.com/jbnote/koya/).

I'm now trying to access this information (really, service location) in two
different ways:

* From within slider. Is there a public API that I could use directly in
python from other slider instances to get to this information ? -- this is
necessary for spawning Kafka mirroring from slider, for instance. From what
I can see in storm-slider, the slider binary is directly invoked.


The code to look up entries is is in the hadoop-yarn-registry API; shipping in 
Hadoop 2.6


* From the rest of the world. I was thinking of exporting the data to DNS,
and hoped to do this with a zookeeper-monitoring daemon, which is already
partially implemented. However, none of my exported data seems to be
present in ZK, which I was naively hoping for. Is there something i'm
missing ? I find the ZK way perfect, rather than the REST API which as far
as I can see will require polling. In python monitoring ZK is a breeze.

Can someone familiar with the design intent shed some light on how I should
carryout this ?

YARN-913 is the registry design;
its documented in 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/registry/index.html
 i

  1.  everything is (publicly) published to ZK
  2.  There's an API ( http://hadoop.apache.org/docs/current/api/index.html ) 
in Java;
  3.  Slider has a .py client too.

It deliberately doesn't publish the full set of documents to the registry; too 
much data & too high a rate of change is what hits ZK scalability and 
performance.

Instead we have a slider-specific API for publishing sets of configurations, 
each configuration being served up as : json, xml, properties

look at org.apache.slider.server.appmaster.web.rest.publisher.PublisherResource 
for the specifics, but it essentially comes down to


GET configuration sets (JSON)
ws/v1/exports/

configuration files of a configuration set
GET ws/v1/exports/${configset}

retrieve a config
ws/v1/exports/${configset}/{configuration}.${suffix}

suffix = [xml|json|properties]

finally, get a specific property

ws/v1/exports/${configset}/{configuration}/${property}


regarding python monitoring, our code is in the slider-agent module. Bear in 
mind that ZK listening isn't that resilient to failures of ZK nodes. Our agent 
only checks it at startup and then starts polling after the AM fails.

The Hive LLAP team are using the YARN registry now, and want to add a TTL field 
to each entry, this would let the client know when to recheck.


Reply via email to