To close the loop on this one : With Steve Niemitz's help, I resolved the issue, kafka-scheduler was going to wrong master and the slave it was going to returned a /master/status.json which did not have any information about the registered slaves.
"slaves":[ ], What did I do? 1. Restarted zookeeper - running the docket image on marathon. 2. Edited kafka-mesos.properties with same content but changing seq. It started working and I just tested it by producing one test message and consuming it. Thanks to all of you, who helped, for your time. Vinit. PS: for some reason, Steve was not CCed on one of the replies, so I worked with him 1-1 below is the conversation thread, if anybody else runs into the same problem. On Mon, Jun 27, 2016 at 1:35 PM, Vinit Mahedia <vinitmahe...@gmail.com> wrote: > That exactly was the problem, it was going to the wrong master, I > restarted zookeeper and edited the kafka-mesos.properies with the same > content just changed the sequence of arguments, it discovered new master > after this. > > Although I don't understand why it was discovering wrong master though on > the start up, maybe stale state from ZK, but should not it get the correct > master on startup? > > This is a huge help, truly appreciate. > > > > On Mon, Jun 27, 2016 at 12:44 PM, Steve Niemitz <sniem...@apache.org> > wrote: > >> I think I know what your problem is, I've run into it before. If you >> don't have any slaves registered, mesos will just ignore any attempt to >> register a framework. >> >> I looked at the results from /master/state.json in your packet capture >> and you don't have any registered slaves. >> >> Try bringing up a slave and see if it works then. >> >> On Mon, Jun 27, 2016 at 3:25 PM, Vinit Mahedia <vinitmahe...@gmail.com> >> wrote: >> >>> I replied to the thread and realized that for some reason, Joe Stein's >>> reply did not have you the CC list, so forwarding you this. >>> >>> >>> Hi Steve, >>> >>> Finally, I am getting the Jetty error for bad HTTP message - logs >>> <https://gist.github.com/vmahedia/3a56e432d95c0b54912626293d449ec8>. >>> Although I get this after a long time. Here is the packet capture. >>> >>> Kafka running on 10.10.17.41 and mesos-master is running on 10.10.17.68. >>> >>> Here is the screenshot, if you quickly want to look at the logs. >>> >>> Scheduler-logS >>> × >>> >>> >>> >>> Packet Capture with only POST showing >>> Following the TCP stream of above capture. >>> >>> >>> >>> >>> On Fri, Jun 24, 2016 at 6:21 AM, Joe Stein <joe.st...@stealth.ly> wrote: >>> >>>> +1 to setting --debug >>>> >>>> also make sure you set the --api via CLI or properties file correctly >>>> suspectfully sounds like the issue that keeps going back and forth (or >>>> since it isn't set right not going back and forth ... ) .... its also why >>>> it works on master because your property file may have api as localhost or >>>> such... >>>> >>>> thinking out loud quick here but overall definitely a first thing to >>>> check >>>> >>>> regards, >>>> >>>> >>>> ~ Joe Stein >>>> >>>> On Fri, Jun 24, 2016 at 12:05 AM, Steve Niemitz <sniem...@apache.org> >>>> wrote: >>>> >>>>> Have you tried running the Kafka scheduler in debug? (Pass --debug to >>>>> iirc). That gives you a good amount of output in stdout/stderr. >>>>> >>>>> Also make sure your mesos lib that the scheduler is running matches >>>>> your >>>>> master version. >>>>> >>>>> Finally, make sure the master can communicate BACK to the scheduler on >>>>> whatever port you set as the LIBPROCESS_PORT on the scheduler. >>>>> (Firewall >>>>> rules and such) >>>>> >>>>> If you want to post the stderr/stdout logs from the scheduler (with >>>>> debug >>>>> on) I can take a look. >>>>> On Jun 23, 2016 4:41 PM, "Vinit Mahedia" <vinitmahe...@gmail.com> >>>>> wrote: >>>>> >>>>> > I am running into an issue where kafka framework can't register with >>>>> > Mesos. In packet capture I see the POST request to subscribe on mesos >>>>> > master box but mesos master does not respond, neither it logs that it >>>>> > received the request which it usually does on any attempt by a >>>>> framework. >>>>> > Kafka-framework keeps re-sending the request and eventually gives up. >>>>> > >>>>> > running locally, Mesos in VM1 and kafka-framework on VM2 works fine, >>>>> also >>>>> > some people here reported that putting kafka-framework on mesos >>>>> master >>>>> > works as well, but that is not actually working deployment. >>>>> > >>>>> > I can provide more information. Thanks for offering help, I really >>>>> > appreciate your time. >>>>> > >>>>> > On Fri, Jun 17, 2016 at 1:36 PM, Steve Niemitz <sniem...@twitter.com >>>>> > >>>>> > wrote: >>>>> > >>>>> >> No issues here, we've been running two 8 broker clusters for ~a >>>>> month >>>>> >> without incident, and I plan on rolling it out to one of our larger >>>>> (~40 >>>>> >> broker) clusters next week. >>>>> >> >>>>> >> My experience with it has been really positive so far, it just >>>>> pretty >>>>> >> much worked out of the box. I'm curious what issues you ran into, >>>>> happy >>>>> >> to try to help if you want! >>>>> >> >>>>> >> On Fri, Jun 17, 2016 at 3:31 PM, Vinit Mahedia < >>>>> vinitmahe...@gmail.com> >>>>> >> wrote: >>>>> >> >>>>> >>> Hi Steve, >>>>> >>> >>>>> >>> How long has it been running without problems? I have read on >>>>> mailing >>>>> >>> list >>>>> >>> some people complaining that brokers sometimes disappear etc. Have >>>>> you >>>>> >>> come >>>>> >>> across any such problems? Any other issues that you had to take >>>>> care of? >>>>> >>> >>>>> >>> I tried to use the version you specified and also took the latest >>>>> release >>>>> >>> of kafka without luck although I am happy to have found someone >>>>> for whom >>>>> >>> it's working. >>>>> >>> >>>>> >>> >>>>> >>> On Fri, Jun 10, 2016 at 3:57 PM, Steve Niemitz < >>>>> sniem...@apache.org> >>>>> >>> wrote: >>>>> >>> >>>>> >>> > Just to chime in, I've been running the 0.9.5.1 scheduler as a >>>>> task on >>>>> >>> > another slave without issues. (Aurora runs the kafka-mesos >>>>> scheduler >>>>> >>> in my >>>>> >>> > case). >>>>> >>> > >>>>> >>> > On Thu, Jun 9, 2016 at 2:50 PM, Vinit Mahedia < >>>>> vinitmahe...@gmail.com> >>>>> >>> > wrote: >>>>> >>> > >>>>> >>> > > Justin, >>>>> >>> > > >>>>> >>> > > When you say "working" - does it mean kafka-scheduler still >>>>> has to >>>>> >>> be on >>>>> >>> > > the same box as mesos-master? or you >>>>> >>> > > have it working without that constraint? >>>>> >>> > > >>>>> >>> > > On Wed, Jun 8, 2016 at 6:07 PM, Justin Ryan < >>>>> jur...@ziprealty.com> >>>>> >>> > wrote: >>>>> >>> > > >>>>> >>> > > > inline >>>>> >>> > > > >>>>> >>> > > > On 6/8/16, 4:06 PM, "Justin Ryan" <jur...@ziprealty.com> >>>>> wrote: >>>>> >>> > > > >>>>> >>> > > > >FYI, when I updated to the latest kafka-mesos (0.5.1.0) this >>>>> >>> problem >>>>> >>> > > went >>>>> >>> > > > away. FWIW, I’m actually using a branch which updates kafka >>>>> to >>>>> >>> > 0.10.0.0 >>>>> >>> > > as >>>>> >>> > > > well: >>>>> >>> > > > > >>>>> >>> > > > >>>>> >>> > > > Correction: 0.9.5.1 (current git master) >>>>> >>> > > > >>>>> >>> > > > > PR for kafka 0.10.0.0 (tests still fail, someone else did >>>>> the >>>>> >>> bulk of >>>>> >>> > > > porting but didn’t PR it) : >>>>> >>> https://github.com/mesos/kafka/pull/220 >>>>> >>> > > > > ( ./gradlew jar –x test gets a successful build ) >>>>> >>> > > > > >>>>> >>> > > > > Issue for the problem discussed in this thread: >>>>> >>> > > > https://github.com/mesos/kafka/issues/199 >>>>> >>> > > > > >>>>> >>> > > > >Cheers! >>>>> >>> > > > > >>>>> >>> > > > >>>>> >>> > > > >>>>> >>> > > > ________________________________ >>>>> >>> > > > >>>>> >>> > > > P Please consider the environment before printing this e-mail >>>>> >>> > > > >>>>> >>> > > > The information in this electronic mail message is the >>>>> sender's >>>>> >>> > > > confidential business and may be legally privileged. It is >>>>> intended >>>>> >>> > > solely >>>>> >>> > > > for the addressee(s). Access to this internet electronic mail >>>>> >>> message >>>>> >>> > by >>>>> >>> > > > anyone else is unauthorized. If you are not the intended >>>>> >>> recipient, any >>>>> >>> > > > disclosure, copying, distribution or any action taken or >>>>> omitted >>>>> >>> to be >>>>> >>> > > > taken in reliance on it is prohibited and may be unlawful. >>>>> The >>>>> >>> sender >>>>> >>> > > > believes that this E-mail and any attachments were free of >>>>> any >>>>> >>> virus, >>>>> >>> > > worm, >>>>> >>> > > > Trojan horse, and/or malicious code when sent. This message >>>>> and its >>>>> >>> > > > attachments could have been infected during transmission. By >>>>> >>> reading >>>>> >>> > the >>>>> >>> > > > message and opening any attachments, the recipient accepts >>>>> full >>>>> >>> > > > responsibility for taking protective and remedial action >>>>> about >>>>> >>> viruses >>>>> >>> > > and >>>>> >>> > > > other defects. The sender's employer is not liable for any >>>>> loss or >>>>> >>> > damage >>>>> >>> > > > arising in any way. >>>>> >>> > > > >>>>> >>> > > >>>>> >>> > > >>>>> >>> > > >>>>> >>> > > -- >>>>> >>> > > ~Vinit >>>>> >>> > > >>>>> >>> > >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> ~Vinit >>>>> >>> >>>>> >> >>>>> >> >>>>> > >>>>> > >>>>> > -- >>>>> > ~Vinit >>>>> > >>>>> >>>> >>>> >>> >>> >>> -- >>> ~Vinit >>> >>> >>> >>> -- >>> ~Vinit >>> >> >> > > > -- > ~Vinit > -- ~Vinit