On Mon, 23 Mar 2009, Thomas Koch wrote:

> 1) ezcSearch enforces suffixes
> 
> I had to change my solr schema, because ezcSearch appends a type suffix
> to all fields. There are pros and cons to this. I'd prefer to have the
> choice, whether I want to add those suffixes.
> 
> 2) Limited set of field types
> 
> Solr allows me to define all kind of field types. However ezcSearch
> comes with a limited set of field types. As far as I see, the field
> types are only used ATM to append the suffixes.

For the above two, it should be fairily trivial to extend the handler, 
and change only those parts. Have you looked at that?

> 3) Dates mapped to integer
> 
> The example solr config comming with Debian defines a date field type:
> <fieldType name="date" class="solr.DateField" sortMissingLast="true" 
> omitNorms="true"/>
> However dates are mapped to integers in ezcSearch. I'm not sure yet,
> whether I like this or not.

I had to do this, as the date/time support in solr 1.3 simply did not 
work.

> 4) Indexing only one document at a time
> 
> Unfortunatly I did not run a benchmark before I changed my code, but it
> felt much slower afterwards. I believe the main reason is, that I can
> index only one document at a time with ezcSearch, while I sent 200
> documents with one request in my hand crafted version.

That's not true, you can use transactions:
http://ezcomponents.org/docs/api/latest/Search/ezcSearchSolrHandler.html#beginTransaction

> 5) Slow implementation
> 
> Please have a look at the attached kcachegrind screenshot. The code
> indexes 200 documents. After the communication functions (fgets,
> fwrite), the most time is spent on mapping field names, verifying state
> (solr already does this for me) and slow HTTP handling via preg_*
> functions. Most of the slowest function calls however could be reduced
> to only a fraction of their calls, if I could index multiple documents
> at once. 

But you can do that... with transactions.

> One of my favorites are the preg_* functions. They are called 4849
> times. Derick seems to love them. I try to avoid them. In my last
> company we had a discussion, that we should avoid object orientation,
> because it would be slow. I made many benchmarks back then and
> discovered, that the slowest things in our code were the many preg_*
> functions.

Well, the other options are to do it with complicated string handling 
functions... which I doubt would be any faster. Let me know if you have 
any concrete suggestions here though.

regards,
-- 
Derick Rethans
eZ components Product Manager
eZ systems | http://ez.no

-- 
Components mailing list
Components@lists.ez.no
http://lists.ez.no/mailman/listinfo/components

Reply via email to