On Fri, Nov 6, 2015 at 4:32 AM, Nick Wellnhofer <[email protected]> wrote:
> Lucifers,
>
> Currently, Lucy uses CharBuf in two ways. One is to build Strings which is
> the typical use case. In two places they're used as a resizable buffer to
> hold UTF-8 data.
>
> S_write_terms_and_postings in PostingPool:
>
> https://github.com/apache/lucy/blob/master/core/Lucy/Index/PostingPool.c#L356

This location is an inner loop when building the index.  For performance
reasons, we want to avoid too many memory copies, hence the approach of
continually mutating a buffer of UTF-8 content.  We also want to avoid making
mistakes, hence the approach of using using library code to perform string
manipulation rather than reinventing the wheel within Lucy.

Both of those concerns are negotiable, but that's the background on why
things are they way they are.

> In TextTermStepper:
>
>     https://github.com/apache/lucy/blob/master/core/Lucy/Plan/TextType.c
>
> I'm wondering whether we should restrict CharBuf to string building and
> rename it to something like StringBuilder. This would allow to remove a
> couple of methods. For resizable buffers, users could switch to ByteBuf,
> maybe with additional convenience methods to get Strings in and out of
> ByteBufs.

Any time there's an opportunity to destroy code, my ears prick up. :)  The
code in CharBuf certainly gets a lot of use for things like like Str_newf(),
which is the StringBuilder case.  So then the question is what would
replace the performance-sensitive buffer case described above?

Marvin Humphrey

Reply via email to