On Fri, Nov 6, 2015 at 4:32 AM, Nick Wellnhofer <[email protected]> wrote: > Lucifers, > > Currently, Lucy uses CharBuf in two ways. One is to build Strings which is > the typical use case. In two places they're used as a resizable buffer to > hold UTF-8 data. > > S_write_terms_and_postings in PostingPool: > > https://github.com/apache/lucy/blob/master/core/Lucy/Index/PostingPool.c#L356
This location is an inner loop when building the index. For performance reasons, we want to avoid too many memory copies, hence the approach of continually mutating a buffer of UTF-8 content. We also want to avoid making mistakes, hence the approach of using using library code to perform string manipulation rather than reinventing the wheel within Lucy. Both of those concerns are negotiable, but that's the background on why things are they way they are. > In TextTermStepper: > > https://github.com/apache/lucy/blob/master/core/Lucy/Plan/TextType.c > > I'm wondering whether we should restrict CharBuf to string building and > rename it to something like StringBuilder. This would allow to remove a > couple of methods. For resizable buffers, users could switch to ByteBuf, > maybe with additional convenience methods to get Strings in and out of > ByteBufs. Any time there's an opportunity to destroy code, my ears prick up. :) The code in CharBuf certainly gets a lot of use for things like like Str_newf(), which is the StringBuilder case. So then the question is what would replace the performance-sensitive buffer case described above? Marvin Humphrey
