On 05/19/15 00:45, Jeroen Vermeulen wrote:
> On 19/05/15 09:22, Kenneth Heafield wrote:
>> There are non-traditional uses like ReadLine('\0') to read
>> null-delimited tokens.
> 
> While exploring this change I didn't find a single use of that parameter
> in the Moses source tree!

It's in some new code in KenLM master that I wouldn't feel bad about
changing.

> 
> But there may be uses outside the project of course.  That's one of the
> dangers of duplicating code.  Is there an overview somewhere of what
> code in Moses was copied in but is actually maintained elsewhere?

We tried git submodules and it was far worse :-\

> 
>> But I'd support Jeroen here: the default ReadLine() with no argument
>> should swallow \r.
> 
> To be clear though: with my change, FilePiece remains an exact binary
> representation of the file.  It's just that ReadLine() returns a
> slightly shorter piece of it, just like it already swallows \n.
> 
> (Side note: I've had a quick look at StringPiece now and it looks like a
> really useful abstraction for performance.  And apparently that same
> abstraction is going to be in the C++17 standard library as string_view.)

Welcome!  Or string_ref as the case may be.  I will happily upgrade to
the standard version when the time comes.

This is also why I consider most instances of vector<string> to be a
performance bug, including util/tokenize.hh .  Though I understand you
were just moving slow code around.

> 
>> In any case if you're going to change code there, can you do it upstream
>> in github.com/kpu/kenlm ?  I just gave you commit access.
> 
> Will do, thanks.  What's the procedure for "downstreaming" that into Moses?

Saw your commit, thanks.  It gets copied on occasion and I include the
kenlm git commit it came from.  If you want to copy it now that's fine.
 A bit more formally one could squash and merge.

> 
> 
>> Also, how would you feel if I changed it to be FakeIFStream with
>> operator>> extraction, at least for integer/float types?
> 
> Sorry, I haven't looked into FakeIFStream at all yet, and I may not
> fully understand the question.

It doesn't exist yet.  I am contemplating refactoring FilePiece to have
operator>> and renaming it to FakeIFStream.

> 
> 
> Jeroen
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to