Disclaimer: I realize you wouldn't want to do this for anything other than a toy collection.
Perk: however, this overall discussion might also be useful people wanting to use other codes by default, for example the faster BlockPostingsFormat. Old info online: Instructions for using enabling SimpleText were written back in its early days. But in more recent versions of Solr these instructions are largely obsolete, you DON'T need to do most of that. You can just add postingsFormat="SimpleText" to a <fieldType> tag and get the new behavior. I believe it's similar for using the BlockPostingsFormat. But when you do this (add it to text_general for example), although your text fields reside in the new format, the other files in the index directory are still binary. By the time your debugging gets to your text field values, some "magic" has already happened via the other files (the system already knows about offsets into the file, for example) Question: Can SimpleText even be used for the other binary files in an index? Or is it somehow specific in scope to field tokens? Question: If it can be used for all the other files, what's the setting for that? I had seen a switch -Dtests.codec=SimpleText in the old instructions, but clearly that's for unit tests, and wasn't sure of it's scope or applicability. Question: Has anybody tried using BlockPostingsFormat as a default codec? (for all files) Did it work? Was it faster that just applying to your text fields? Other questions... Or maybe there's some other aspect to all of this that I'm missing, some other question I should really be asking? The old posts online seem to assume fairly deep understanding of Lucene & Solr's overall codec framework, which was appropriate at that time. But now it's included by default, so it's sort of "mainstream", and although I generally understand codes, there's still aspects of it in Solr that I'm a bit hazy one; wondering if others have the same feeling? Examples of things I'm a bit hazy on: Are there rules about which codes can be used where? Can you mix and match codes? Can you chain them? I also saw the FilterCodec javadoc. Would I only use that if I want to reuse most of an existing code, but alter just one part of it? I'm a bit fuzzy combining that with other codes. If there's a java command line -D switch that tells the system to use a different (but already existing) code, then I don't think I'd need this at all? -- Mark Bennett / LucidWorks: Search & Big Data / [email protected] Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
