Re: Future for the JOni regex library

Hannes Wallnoefer Thu, 09 May 2013 18:48:51 -0700

Sorry for the late reply again. I've been at JavaOne India most of theweek and was a bit distracted.

All our changes to Joni were made under the original license, so in myunderstanding there shouldn't be any problem in taking them back. A Isaid the patch is incomplete and probably won't make it into JDK8, butof course if you could make something out if it that would be wonderful!

We've just started discussing our plans for JDK9. I haven't contactedMarcin yet but I'll absolutely contact him and you about this.


Hannes

Am 2013-05-06 23:39, schrieb Charles Oliver Nutter:

Thanks for the updates, Hannes!

Wow, I didn't expect to hear you already had a regex compiler, but I
suppose I should have known :-) I wonder if we can port it back to
Joni and finish it...regex compilation has been on our want list for a
long time.

I understand what you mean about stripping stuff out. We have to use
Joni almost as-is because of the complexities of Ruby regex and
encoding logic, but there's not much need for you to do the same.
Sharing in the long term will probably be difficult.

I'm also really excited to hear that you will try to JEP this into
OpenJDK as the new regex backend. Have you been in contact with the
author of Joni, Marcin Mielzinsky? He would be very proud to know this
is in process, and obviously he deserves pretty much all the credit
for making this thing happen.

- Charlie

On Wed, May 1, 2013 at 5:50 PM, Hannes Wallnoefer
<hannes.wallnoe...@oracle.com> wrote:

Hi Charlie,

I feel a bit guilty for not getting (or keeping) in touch with you about
this. We recently switched to Joni as our default regexp engine and it's
working pretty well.

What we have in Nashorn now is still relatively close to the JRuby codebase.
Both share the same package structure, classes, and methods. Our code is
just simpler because it doesn't have to deal with different encodings. My
github fork contains a "noencoding" branch that represents the connection
between the two:

https://github.com/hns/joni/tree/noencoding

However, there are some forces that might force us to drift further apart.
One of them is code coverage. As it is, JavaScript uses a rather limited
subset of what Joni provides, and this means a lot of code is neither used
nor tested. Maintaining these bits doesn't seem to make sense (as far as
Nashorn is concerned).

It's a similar story with coding standards. We ran FindBugs over Joni and it
found a number of issues, including things like public final arrays. Fixing
these could require us to change the package structure or make other
structural changes. Not to mention missing Javadocs and obscure naming,
which would also drive us apart when fixed on our side.

As Jim said I also worked on ASM bytecode generation and got quite far with
it except for some combinations of nested quantifiers and captures I
couldn't figure out. I've suspended the work for the time being since it's
not the highest priority thing to do, but here's the patch:

http://cr.openjdk.java.net/~hannesw/8012269/

I definitely think it would be a great idea to keep our versions of Joni
connected and evolving together. Right now this would still be relatively
easy, but it will become harder as time goes by.

Hannes


Am 2013-05-01 22:10, schrieb Charles Oliver Nutter:

Hello!

I saw a few weeks back that you guys have adopted JRuby's regex
engine, JOni, modified to work only with Java's char[]. We're thrilled
that you've found our engine useful enough to incorporate!

However, I'm wondering about the future of these engines. We have
planned improvements, patches that come in from time to time, and so
on, and maintaining two separate copies will eventually lead to them
diverging. But without any way to specialize our byte[]-based JOni for
char[] easily, I'm not sure what can be done.

Any thoughts on this? Just to tempt you... a few of the planned
improvements:

* JVM bytecode compiler, for more fastness
* Thread interruptible execution, to kill off regex runs that don't
complete

It would be great if we could collaborate on such things.

- Charlie

Re: Future for the JOni regex library

Reply via email to