azotcsit commented on PR #1284: URL: https://github.com/apache/activemq/pull/1284#issuecomment-2309353953
Thanks to everyone for the comments! They were very helpful! @cshannon > So something needs fixing but I was just curious if using the built in JDK support might be a better option for maintainability going forward. Thanks for pointing this out! I checked https://issues.apache.org/jira/browse/AMQ-8122 ticket; I totally agree with the points mentioned in this ticket. So I re-wrote the existing test and split it to separate scenarios for 1,2,3,4 bytes characters. > I'm open to either using the changes here or the JDK implementation if we can get that to work (like I quickly added a test in my branch https://github.com/cshannon/activemq/commit/e68021d33bbc1396d7e72083ee17fd00b620e374 , although it's not quite workign all the way as i noted). The main thing is making sure it's correct and we don't introduce a bug I totally support moving to standard Java utilities/implementation. The only reason I had not done that initially - I thought that we wanted to minimize amount of changes. But since community seems to be open to getting rid of the custom implementation, I'd be glad to migrate to the standard implementation. @tabish121 > We do something similar in the AMQP codec in proton-j for this same reason. Could you please share a link to the corresponding codebase, so I can explore it. I'm not sure it is necessary, but I might find something that could be missed in this PR. > which requires a buffer sized to the expected output encoding size so to allow the openwire encoder to not have to guess (possibly more than once) the code was created. I had not found a standard method for calculating the encoded size, so I kept the custom implementation for that. The rest, I moved to standard methods. It should be still compatible with all existing protocols. @cshannon > I tried testing writing a string with 4 byte unicode (the same one in the unit test in this PR) with the old/existing encoder and then decoding it using the JDK UTF-8 decoder in my branch and it breaks. What's interesting is things decode fine with the modified method in this PR to decode. It is expected for the JDK UTF-8 decoder to fail as the existing encoder does not support 4-bytes characters. But the modified logic in the first commit should also fail to read data written by the existing encoder. Could you please double check this? I believe it did not work for me. > What's even more interesting to me is that even though the existnig code is only supposed to handle 3 byte unicode, i tried testing writing/reading the 4 byte unicode string and it worked (which I guess there's a bug or some other special handling in the custom code) Yes, this is tricky! If the old/existing logic is used for both writing and reading, then it works even for 4-byte characters. In fact, 3-bytes logic serializes 4-bytes data incorrectly, but since it reads data back (and expects incorrect bytes), it works. > So I'm curious what the big differences are between the JDK version and the modified version in this PR. Ideally, there should be no difference. In fact, I changed custom logic to the standard JDK methods and everything seems to be working the same way. > It's critical to make sure things will be able communicate across versions if sending 4 byte unicode. For example, any old messages in a data store would need to be able to be decoded still after upgrade and another example could be old clients with the old encoding logic sending messages with 4 byte unicode to a broker that has been updated. This is a very great point! I think the current changes are not backward-compatible. I'll give a few rounds of testing to backward-compatibility to make sure there is a problem. Most probably, it will require to develop some additional logic for that. Could you please point me to some code where there is some compatibility-related logic. I'd like to understand how this type of problems are handled currently. --- Summary for the second commit: * replaced custom serialization/deserialization logic with standard JDK methods * improved `DataByteArrayInputStreamTest.testNonAscii()` as per https://issues.apache.org/jira/browse/AMQ-8122 Next steps: * test current logic for compatibility between JMS and STOMP protocols * test compatibility for old client connected to new server * test compatibility for new client connected to old server * test compatibility for persisted messages after upgrade * further development based on testing results and PR comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@activemq.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@activemq.apache.org For additional commands, e-mail: gitbox-h...@activemq.apache.org For further information, visit: https://activemq.apache.org/contact