There are no bookkeeper errors at around the time the hedwig hub
disconnects from the bookies. However, a few minutes before the
disconnection, I see some exceptions thrown in the bookkeeper log

I've attached the exceptions in a file.

We are trying to load test hedwig. Around 1000 QPS for one topic were
sustained for about 30 minutes. We then cranked up the load to around 2000
QPS for the same topic and we got this error. The setup is 15 hedwig hubs
and 15 bookies with ensemble size of 5 and replication factor of 3.

Regards,
Aniruddha.

On Tue, Apr 3, 2012 at 2:09 AM, Ivan Kelly <[email protected]> wrote:

> This type of disconnection occurs when there's a read timeout from one of
> the bookies. The cause could be something crashing on the bookie side, or
> simply a very slow network. What type of network are you running this in?
> Do you have any logs on the bookie side?
>
> -Ivan
>
> On 3 Apr 2012, at 03:22, Aniruddha Laud wrote:
>
> > While sending requests to a hedwig hub, the hub seems to disconnect from
> > the bookies and never connects back. The logfile contains
> >
> > 2012-04-02 22:33:09,207 - INFO [Hashed wheel timer
> > #3:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.84.103:3181
> > 2012-04-02 22:33:09,211 - INFO [Hashed wheel timer
> > #4:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.133.114:3181
> > 2012-04-02 22:33:09,214 - INFO [Hashed wheel timer
> > #5:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.89.103:3181
> > 2012-04-02 22:33:09,217 - INFO [Hashed wheel timer
> > #8:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.35.91.102:3181
> > 2012-04-02 22:33:09,247 - INFO [Hashed wheel timer
> > #10:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.234.125:3181
> > 2012-04-02 22:33:09,256 - INFO [Hashed wheel timer
> > #7:PerChannelBookieClient@409] - Disconnected from bookie: /
> > 10.34.235.129:3181
> >
> > Some time before getting this message, the "Got response for ..."
> messages
> > stop and there are only "Successfully wrote request ..." messages in the
> > hedwig log file. The bookkeeper log-file shows no indication of the
> > connection being lost. All the bookies and hedwig hubs are up and running
> > and I am able to connect to them with the hedwig console and able to
> create
> > new topics and publish/subscribe to them. But I'm not able to publish or
> > subscribe to the topic that caused the errors. About 200,000 entries were
> > created in the topic that caused this error.
> >
> > I'm unable to attach the log files or even portions of it, because the
> > relevant portions are around 3MB.
> >
> > Regards,
> > Aniruddha.
>
>
2012-04-03 22:18:49,224 - DEBUG 
[GarbageCollectorThread:GarbageCollectorThread@306] - Compacting entry log 0 
below threshold 0.20000000298023224.
2012-04-03 22:18:49,224 - INFO  
[GarbageCollectorThread:GarbageCollectorThread@368] - Compacting entry log : 0
2012-04-03 22:18:49,224 - WARN  [GarbageCollectorThread:EntryLogger@393] - 
Failed to get channel to scan entry log: 0.log
2012-04-03 22:18:49,224 - INFO  
[GarbageCollectorThread:GarbageCollectorThread@375] - Premature exception when 
compacting 0
java.io.FileNotFoundException: No file for log 0
        at 
org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:365)
        at 
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:339)
        at 
org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:391)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.compactEntryLog(GarbageCollectorThread.java:371)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.doCompactEntryLogs(GarbageCollectorThread.java:309)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:227)
2012-04-03 22:18:49,225 - DEBUG 
[GarbageCollectorThread:GarbageCollectorThread@306] - Compacting entry log 1 
below threshold 0.20000000298023224.
2012-04-03 22:18:49,225 - INFO  
[GarbageCollectorThread:GarbageCollectorThread@368] - Compacting entry log : 1
2012-04-03 22:18:49,225 - WARN  [GarbageCollectorThread:EntryLogger@393] - 
Failed to get channel to scan entry log: 1.log
2012-04-03 22:18:49,225 - INFO  
[GarbageCollectorThread:GarbageCollectorThread@375] - Premature exception when 
compacting 1
java.io.FileNotFoundException: No file for log 1
        at 
org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:365)
        at 
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:339)
        at 
org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:391)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.compactEntryLog(GarbageCollectorThread.java:371)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.doCompactEntryLogs(GarbageCollectorThread.java:309)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:227)

Reply via email to