Hi Rob and Gil,
I am trying to do something similar, even in smaller scale. My
configuration looks something like:
[ModgenKafkaInput0]
type = "KafkaInput"
topic = "logs"
addrs = ["kafka_tmp.modgen.net:9092"]
splitter = "KafkaSplitter"
decoder = "ProtobufDecoder"
group = "kafka-client-group01"
partition = 0
event_buffer_size = 512
max_open_reqests = 8
default_fetch_size = 65536
[ModgenKafkaInput1]
type = "KafkaInput"
topic = "logs"
addrs = ["kafka_tmp.modgen.net:9092"]
splitter = "KafkaSplitter"
decoder = "ProtobufDecoder"
group = "kafka-client-group01"
partition = 1
event_buffer_size = 512
max_open_reqests = 8
default_fetch_size = 65536
[KafkaSplitter]
type = "NullSplitter"
use_message_bytes = true
[ESJsonEncoder]
es_index_from_timestamp = true
type_name = "%{Type}"
[ElasticSearchOutput]
server = "http://localhost:9200"
message_matcher = "Type !~ /^heka/"
encoder = "ESJsonEncoder"
flush_interval = 100 # in ms
flush_count = 50
use_buffering = true
queue_max_buffer_size = 102400000
queue_full_action = "shutdown"
I am currently observing quite low throughput though. Not rely sure
why, but it seems that problem is between kafka and heka.
Best,
Antonin
* Rob Miller <[email protected]> [2015-04-15 07:41] wrote:
> On 04/14/2015 12:48 PM, Gil Fliker wrote:
> >Thx for the quick response,
> >
> >I am not yet familiar with all of heka's futures specifically with
> >"message_matcher".
> That's one of Heka's fundamental concepts, please see
> http://hekad.readthedocs.org/en/v0.9.1/index.html,
> http://hekad.readthedocs.org/en/v0.9.1/getting_started.html and
> http://hekad.readthedocs.org/en/v0.9.1/message_matcher.html.
> >Let me just add that the reasoning behind the high number of partitions
> >is to enable parallelism to support the throughput needed.
> >
> >Can you please point me in a direction for a similar heka example ?
> Sorry, there's no existing example that I can point you to at the moment.
> We're happy to answer specific questions, to the extent we're able, but every
> massively parallel data processing infrastructure is going to be different,
> you're going to have to get familiar with the building blocks that Heka
> provides and drill down a bit before you'll be able to get a useful response.
> :)
>
> -r
>
> >
> >
> >Thx
> >
> >
> >
> >
> >Gil Fliker
> >
> >
> >On Tue, Apr 14, 2015 at 3:22 PM, Rob Miller <[email protected]
> ><mailto:[email protected]>> wrote:
> >
> > Yes, currently a single KafkaInput can only pull from a single Kafka
> > partition. You can think of Heka's KafkaInput as analogous to a
> > SimpleConsumer (see
> >
> > https://cwiki.apache.org/__confluence/display/KAFKA/0.8.__0+SimpleConsumer+Example
> >
> > <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example>).
> >
> > If you want to manage inter-partition coordination, along the lines
> > of what is described as a "High Level Consumer"
> >
> > (https://cwiki.apache.org/__confluence/display/KAFKA/__Consumer+Group+Example
> >
> > <https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>),
> > you'd handle that at the filter layer. For instance, you might set
> > up a filter plugin with a message_matcher constructed such that it
> > catches all of the messages from a single topic, regardless of
> > partition, and perform any necessary correlations therein. The
> > delivery semantics to this filter would match that described on the
> > consumer group example page linked above, i.e. all of the messages
> > from a single partition will be received in the correct order, but
> > the messages from across partitions would be non-deterministically
> > interleaved.
> >
> > If there are so many partitions carrying so much data that a single
> > Heka instance can't handle them all, then you might have to have one
> > box handling one subset of partitions, another box processing a
> > different subset, and each of *those* in turn feeding into a third
> > box that performs the next level of correlation.
> >
> > In other words, the building blocks are there, but you have to
> > actually use them to put together a more sophisticated system. We're
> > unfortunately not yet at the point where there are higher level
> > constructs that will automatically distribute load for you.
> >
> > Hope this helps!
> >
> > -r
> >
> >
> >
> > On 04/10/2015 02:21 PM, Gil Fliker wrote:
> >
> > Hi,
> >
> > We are about to start a poc using Heka.
> >
> > The plan is to pipe messages via Kafka transport and Heka being the
> > endpoints speaking http with various producers and consumers.
> >
> > I saw in the documentation that you have to specify a partition
> > number
> > and only one partition number ?
> >
> > Our Kafka topic setup will be made of around 1000 partitions.
> >
> > What is the best way to approach this ?
> >
> >
> > Thx
> >
> >
> > Gil Fliker
> >
> > Outbrain Operations Manager
> >
> > The above terms reflect a potential business arrangement, are
> > provided
> > solely as a basis for further discussion, and are not intended
> > to be and
> > do not constitute a legally binding obligation. No legally binding
> > obligations will be created, implied, or inferred until an
> > agreement in
> > final form is executed in writing by all parties involved.
> >
> > This email and any attachments hereto may be confidential or
> > privileged.
> > If you received this communication by mistake, please don't
> > forward it
> > to anyone else, please erase all copies and attachments, and
> > please let
> > me know that it has gone to the wrong person. Thanks.
> >
> >
> > _________________________________________________
> > Heka mailing list
> > [email protected] <mailto:[email protected]>
> > https://mail.mozilla.org/__listinfo/heka
> > <https://mail.mozilla.org/listinfo/heka>
> >
> >
> >
> >
> >The above terms reflect a potential business arrangement, are provided
> >solely as a basis for further discussion, and are not intended to be and
> >do not constitute a legally binding obligation. No legally binding
> >obligations will be created, implied, or inferred until an agreement in
> >final form is executed in writing by all parties involved.
> >
> >This email and any attachments hereto may be confidential or privileged.
> > If you received this communication by mistake, please don't forward it
> >to anyone else, please erase all copies and attachments, and please let
> >me know that it has gone to the wrong person. Thanks.
>
> _______________________________________________
> Heka mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/heka
>
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka