If the code looks good wouldn't that imply a bug of some kind in the 
FileChannel?   Although I can add the sleep, the purpose of the test is to 
simulate a failure and the failover to the secondary agent. Obviously in that 
case there would be no sleep.

Ralph

On Sep 8, 2012, at 10:36 AM, Hari Shreedharan wrote:

> Hi Ralph,  
> 
> The code looks good. I think this might be due to some timing issues, though 
> I am not sure what. I'd suggest you add a Thead.sleep(2000) before this line: 
> primarySource.stop();, so once your whole transaction is done, wait for the 
> Avro sink/Rpc Client to get the Success message, so it will not send 
> duplicates.  
> 
> Please let me know if that causes the failures to disappear.
> 
> 
> Thanks
> Hari
> 
> --  
> Hari Shreedharan
> 
> 
> On Saturday, September 8, 2012 at 10:01 AM, Ralph Goers wrote:
> 
>> Did you ever get a chance to look at this? I am still getting these failures 
>> almost every time it runs.
>> 
>> Ralph
>> 
>> On Aug 31, 2012, at 10:35 AM, Hari Shreedharan wrote:
>> 
>>> Thanks Ralph. Let me take a look at the code.  
>>> 
>>> --  
>>> Hari Shreedharan
>>> 
>>> 
>>> On Friday, August 31, 2012 at 9:38 AM, Ralph Goers wrote:
>>> 
>>>> Which file? The files from the FileChannel, the source or …? If you want 
>>>> the FileChannel stuff, unfortunately it only is failing on the machine 
>>>> where Gump runs and I don't have a clue how to get access to it or if it 
>>>> is even left around after the run. As I said, I've never had this fail on 
>>>> the Mac(s), my Linux system or in Jenkins. I have no idea what is peculiar 
>>>> about that system but I do know the tests take about twice as long as they 
>>>> do on my Mac.
>>>> 
>>>> If you want to look at the actual unit test source, it is at 
>>>> https://svn.apache.org/repos/asf/logging/log4j/log4j2/trunk/flume-ng/src/test/java/org/apache/logging/log4j/flume/appender/FlumeEmbeddedAppenderTest.java
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>> 
>>>> On Aug 31, 2012, at 12:15 AM, Hari Shreedharan wrote:
>>>> 
>>>>> It looks like the channel has already got the events before the source 
>>>>> stops. Can you send me a link to the actual file, so I can take a look?  
>>>>> 
>>>>> 
>>>>> Hari  
>>>>> 
>>>>> --  
>>>>> Hari Shreedharan
>>>>> 
>>>>> 
>>>>> On Thursday, August 30, 2012 at 9:44 AM, Ralph Goers wrote:
>>>>> 
>>>>>> Thanks Hari,
>>>>>> 
>>>>>> First, remember that the Flume agent is embedded in the Appender. So the 
>>>>>> Log4j EventSource is passing the event to the FileChannel. The Avro Sink 
>>>>>> then reads from the channel and sends it on. The unit test has two Avro 
>>>>>> Sources listening with each associated with its own MemoryChannel. The 
>>>>>> test logs 10 events then reads the 10 events from the primary 
>>>>>> MemoryChannel, each within its own transaction. The test then stops the 
>>>>>> primary source. Then it logs 10 more events and tries to read them from 
>>>>>> the alternate MemoryChannel.  
>>>>>> 
>>>>>> 
>>>>>> The code to read from the channel looks like:
>>>>>> 
>>>>>> for (int i = 0; i < 10; ++i) {
>>>>>> StructuredDataMessage msg = new StructuredDataMessage("Test", "Test 
>>>>>> Primary " + i, "Test");
>>>>>> EventLogger.logEvent(msg);
>>>>>> }  
>>>>>> for (int i = 0; i < 10; ++i) {
>>>>>> Transaction transaction = primaryChannel.getTransaction();
>>>>>> transaction.begin();
>>>>>> 
>>>>>> Event event = primaryChannel.take();
>>>>>> Assert.assertNotNull(event);
>>>>>> String body = getBody(event);
>>>>>> String expected = "Test Primary " + i;
>>>>>> Assert.assertTrue("Channel contained event, but not expected message. 
>>>>>> Received: " + body,
>>>>>> body.endsWith(expected));
>>>>>> transaction.commit();
>>>>>> transaction.close();
>>>>>> }
>>>>>> 
>>>>>> primarySource.stop();
>>>>>> 
>>>>>> 
>>>>>> for (int i = 0; i < 10; ++i) {
>>>>>> StructuredDataMessage msg = new StructuredDataMessage("Test", "Test 
>>>>>> Alternate " + i, "Test");
>>>>>> EventLogger.logEvent(msg);
>>>>>> }
>>>>>> for (int i = 0; i < 10; ++i) {
>>>>>> Transaction transaction = alternateChannel.getTransaction();
>>>>>> transaction.begin();
>>>>>> 
>>>>>> Event event = alternateChannel.take();
>>>>>> Assert.assertNotNull(event);
>>>>>> String body = getBody(event);
>>>>>> String expected = "Test Alternate " + i;
>>>>>> /* When running in Gump Flume consistently returns the last event from 
>>>>>> the primary channel after
>>>>>> the failover, which fails this test */
>>>>>> Assert.assertTrue("Channel contained event, but not expected message. 
>>>>>> Expected: " + expected +
>>>>>> " Received: " + body, body.endsWith(expected));
>>>>>> transaction.commit();
>>>>>> transaction.close();
>>>>>> }
>>>>>> When I run this on my Mac it never fails. But Gump fails almost every 
>>>>>> time returning "Channel contained event, but not expected message. 
>>>>>> Expected: Test Alternate 0 Received: <128>1 2012-08-30T05:50:04.143Z 
>>>>>> vmgump MyApp - Test [Test@18060][mdc@18060] Test Primary 9"
>>>>>> 
>>>>>> Do we have any tests that are similar to this? I didn't see anything 
>>>>>> that tests failover in this way but I might have missed it.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> On Aug 30, 2012, at 9:22 AM, Hari Shreedharan wrote:
>>>>>> 
>>>>>>> Hi Ralph,  
>>>>>>> 
>>>>>>> Sorry missed this message earlier. How are you simulating failover in 
>>>>>>> your test - I did not look at your code. If the message was written by 
>>>>>>> the Avro Source on the client and the Avro Sink on the other side 
>>>>>>> simply did not get a success would cause the failover sink processor to 
>>>>>>> retry the same message since it would be rolled back by the sink, and 
>>>>>>> hence the channel will end up making it available for another sink. 
>>>>>>> Generally, if a message is not ack-ed as being successfully written to 
>>>>>>> the channel by the Avro Source, the sink will rollback the transaction 
>>>>>>> - and throw an EventDeliveryException - and in case of Failover 
>>>>>>> SinkProcessor, it will cause the next sink to pick it up.  
>>>>>>> 
>>>>>>> Also, note that Flume guarantees at least once semantics and weak 
>>>>>>> ordering. If a failure happens, it is possible that there will be 
>>>>>>> duplicates.  
>>>>>>> 
>>>>>>> And no, this is not related to any of the FileChannel issues we have 
>>>>>>> been fixing.  
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Hari
>>>>>>> 
>>>>>>> --  
>>>>>>> Hari Shreedharan
>>>>>>> 
>>>>>>> 
>>>>>>> On Thursday, August 30, 2012 at 7:50 AM, Ralph Goers wrote:
>>>>>>> 
>>>>>>>> I'm going to try again. Does this problem sound familiar to anyone?  
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>> On Aug 27, 2012, at 3:36 PM, Ralph Goers wrote:
>>>>>>>> 
>>>>>>>>> Does anyone have any thoughts on this? Is it possibly related to any 
>>>>>>>>> of the issues already being fixed on the FileChannel?
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> On Aug 26, 2012, at 4:05 PM, Ralph Goers wrote:
>>>>>>>>> 
>>>>>>>>>> I have successfully embedded Flume into the Log4j 2 Appender. 
>>>>>>>>>> However, I have a unit test that has Flume fail over from one 
>>>>>>>>>> AvroSink to another. When this happens under some circumstances I am 
>>>>>>>>>> getting the last message successfully delivered to the first source 
>>>>>>>>>> as the first message to the second source, which doesn't seem 
>>>>>>>>>> correct. The unit test is 
>>>>>>>>>> athttps://svn.apache.org/repos/asf/logging/log4j/log4j2/trunk/flume-ng/src/test/java/org/apache/logging/log4j/flume/appender/FlumeEmbeddedAppenderTest.java
>>>>>>>>>>  
>>>>>>>>>> (http://svn.apache.org/repos/asf/logging/log4j/log4j2/trunk/flume-ng/src/test/java/org/apache/logging/log4j/flume/appender/FlumeEmbeddedAppenderTest.java).
>>>>>>>>>>  The odd thing is that I cannot get this to fail on my local machine 
>>>>>>>>>> - it only fails when Gump runs it, but it fail fairly consistently.
>>>>>>>>>> 
>>>>>>>>>> The unit test has the AppenderSource connect to a FileChannel. Two 
>>>>>>>>>> AvroSinks are connected to the FileChannel via the Failover 
>>>>>>>>>> processor.
>>>>>>>>>> 
>>>>>>>>>> Is this a known behavior?
>>>>>>>>>> 
>>>>>>>>>> Ralph  
> 

Reply via email to