On 05/19/15 00:45, Jeroen Vermeulen wrote:
> On 19/05/15 09:22, Kenneth Heafield wrote:
>> There are non-traditional uses like ReadLine('\0') to read
>> null-delimited tokens.
>
> While exploring this change I didn't find a single use of that parameter
> in the Moses source tree!
It's in some new code in KenLM master that I wouldn't feel bad about
changing.
>
> But there may be uses outside the project of course. That's one of the
> dangers of duplicating code. Is there an overview somewhere of what
> code in Moses was copied in but is actually maintained elsewhere?
We tried git submodules and it was far worse :-\
>
>> But I'd support Jeroen here: the default ReadLine() with no argument
>> should swallow \r.
>
> To be clear though: with my change, FilePiece remains an exact binary
> representation of the file. It's just that ReadLine() returns a
> slightly shorter piece of it, just like it already swallows \n.
>
> (Side note: I've had a quick look at StringPiece now and it looks like a
> really useful abstraction for performance. And apparently that same
> abstraction is going to be in the C++17 standard library as string_view.)
Welcome! Or string_ref as the case may be. I will happily upgrade to
the standard version when the time comes.
This is also why I consider most instances of vector<string> to be a
performance bug, including util/tokenize.hh . Though I understand you
were just moving slow code around.
>
>> In any case if you're going to change code there, can you do it upstream
>> in github.com/kpu/kenlm ? I just gave you commit access.
>
> Will do, thanks. What's the procedure for "downstreaming" that into Moses?
Saw your commit, thanks. It gets copied on occasion and I include the
kenlm git commit it came from. If you want to copy it now that's fine.
A bit more formally one could squash and merge.
>
>
>> Also, how would you feel if I changed it to be FakeIFStream with
>> operator>> extraction, at least for integer/float types?
>
> Sorry, I haven't looked into FakeIFStream at all yet, and I may not
> fully understand the question.
It doesn't exist yet. I am contemplating refactoring FilePiece to have
operator>> and renaming it to FakeIFStream.
>
>
> Jeroen
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support