Agreed about the cuteness of const Factor *.
Let's say you're reading space-delimited file input.
std::string line("Foo Bar Baz Quux .");
One can make a StringPiece(line.data(), 3) that looks and for most
purposes acts like std::string("Foo") but requires zero memory
allocation. It's not null terminated. It's just a const char * and a
length without owning the underlying memory. This makes it super fast
to parse/split text. util/tokenize_piece.hh provides an iterator
operation for string splitting.
Taking it a step further, util::FilePiece does a rolling mmap of a text
file and gives you StringPiece. Zero-copy file reading.
In Moses preference order for function parameters: const Factor *,
StringPiece, std::string or char *.
On 10/10/2015 06:22 PM, Hieu Hoang wrote:
> Yep. The cinst factor* is the original unique vocab I'd and its more
> useful IMO cos u can get the string back without u referring back to the
> vocab factory. But use what u like
>
> String piece is apparently faster for some operations
>
> On 10 Oct 2015 5:35 pm, "Lane Schwartz" <[email protected]
> <mailto:[email protected]>> wrote:
>
> Wouldn't factor->GetId() be the unique integer ID of the string?
>
> On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <[email protected]
> <mailto:[email protected]>> wrote:
>
> const Factor* is the vocab id. It's guaranteed to be unique for
> each unique string. You can map directly to the string using
> factor->GetString()
>
>
>
> On 09/10/2015 22:55, Lane Schwartz wrote:
>> Thanks, Marcin.
>>
>> So when the various components of Moses pass words back and
>> forth, what do they send each other? std::string? StringPiece?
>>
>> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> For instance in my phrase table that would be
>>
>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>> StringVector<unsigned char, unsigned, std::allocator>
>> m_sourceSymbols;
>> StringVector<unsigned char, unsigned, std::allocator>
>> m_targetSymbols;
>>
>> That's a memory-mapped vector of strings.
>>
>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>> Seriously? That sounds inefficient.
>>>
>>> I've found code in KenLM that maps from strings to
>>> integers, but not the other way around.
>>>
>>> Marcin, do you know, for example, where any Moses code is
>>> for doing the mapping for any data structure?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
>>> <<mailto:[email protected]>[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Hi,
>>> This would only be a simple thing if there was a
>>> common framework for that, but there isn't. Each
>>> datastructure implements its own vocabularies and
>>> look-up tables. There is no common set of integers.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>> Hey,
>>>>
>>>> I know this should be a simple thing to find, but
>>>> what code in Moses is responsible for mapping back
>>>> and forth between strings and integers?
>>>>
>>>> Thanks,
>>>> Lane
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> --
>>> When a place gets crowded enough to require ID's, social
>>> collapse is not
>>> far away. It is time to go elsewhere. The best thing
>>> about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social
>> collapse is not
>> far away. It is time to go elsewhere. The best thing about
>> space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away. It is time to go elsewhere. The best thing about space
> travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support