[
https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925473#action_12925473
]
Robert Muir edited comment on SOLR-2202 at 10/27/10 1:38 PM:
-------------------------------------------------------------
Hi Greg, these are excellent questions... I'll reply only to the localization
ones and let Uwe or others talk about the Trie stuff.
bq. First, currency parsing in Java appears locale-dependent (which obviously
makes sense.) The concern here is that the locale of the end-user performing
queries is likely not the same as the locale of the search engine. Is there
currently a standard mechanism in Solr to acquire the user's locale? What do we
do for other internationalized components?
Well, in general components are internationalized, but this is actually a
localization problem. Usually for Solr, the Solr service is not handlign the
end-users request, so its best if Locale is somehow a parameter, and the
default Locale never used at all. In other words, its up to you to figure out
how you want to determine what Locale to use, and Solr would just respect that.
bq. NumberFormat parsing fails to parse "10.00USD", or "10.00 USD", instead
relying upon the symbol. ("$10.00"). This seems like a limitation since
generally using the currency code as suffix is a locale-independent way of
specifying a monetary value, making indexing code easy to write (simply append
the currency code for the document to the value). It may very well be a good
idea to simply standardize on this approach for the purposes of indexing, and
avoid all the locale-specific issues that come up regarding currency symbols.
Its not really a limitation, it depends upon the NumberFormat in use. The one
you used for parsing is just the Locale default format for that Locale from
getCurrencyInstance... but you can supply your own
[DecimalFormat|http://download.oracle.com/javase/1.5.0/docs/api/java/text/DecimalFormat.html]
pattern too. This is a printf/scanf like pattern that can contain special
characters, particularly ¤ (\u00A4): Currency sign, replaced by currency
symbol. If doubled, replaced by international currency symbol. If present in a
pattern, the monetary decimal separator is used instead of the decimal
separator.
Ideally here, you could allow this pattern to be a parameter.
bq. The NumberFormat parsing does not yield back the currency, just the value.
It seems the currency itself still needs to be extracted somehow. Is there a
built in mechanism to do this? Currently the patch iterates over all currencies
attempting to extract the symbol or code from the value.
Right, NumberFormat parses the actual number. Really its best if the currency
ISO code (e.g. USD) itself is supplied as a parameter, because these symbols
are not unique, for example $ is used for many currencies. I think this is
what Solr should do. if the end-user application doesn't know this somehow, the
end-user application can use more sophisticated mechanisms to "guess" it,
particularly things like ICU's "CurrencyMetaInfo" allow you to supply a
"filter" based on things like region, and timeframes, to get a list of the
currencies used in that region at that time, but I think Solr should just take
the ISO code as input.
bq. How important is it that users have control over the currencies table? It
was quite useful to have the ability to define fake currencies for testing (as
is done in the example currency.xml file), it seems that if I changed the
implementation to use Java's currency table this might be a limitation if
non-testing oriented use-cases exist.
I don't think its import to have fake currencies, but such things can be done
with the Locale SPI ... I think. I think we could just use real currencies for
testing.
was (Author: rcmuir):
Hi Greg, these are excellent questions... I'll reply only to the
localization ones and let Uwe or others talk about the Trie stuff.
bq. First, currency parsing in Java appears locale-dependent (which obviously
makes sense.) The concern here is that the locale of the end-user performing
queries is likely not the same as the locale of the search engine. Is there
currently a standard mechanism in Solr to acquire the user's locale? What do we
do for other internationalized components?
Well, in general components are internationalized, but this is actually a
localization problem. Usually for Solr, the Solr service is not handlign the
end-users request, so its best if Locale is somehow a parameter, and the
default Locale never used at all. In other words, its up to you to figure out
how you want to determine what Locale to use, and Solr would just respect that.
bq. NumberFormat parsing fails to parse "10.00USD", or "10.00 USD", instead
relying upon the symbol. ("$10.00"). This seems like a limitation since
generally using the currency code as suffix is a locale-independent way of
specifying a monetary value, making indexing code easy to write (simply append
the currency code for the document to the value). It may very well be a good
idea to simply standardize on this approach for the purposes of indexing, and
avoid all the locale-specific issues that come up regarding currency symbols.
Its not really a limitation, it depends upon the NumberFormat in use. The one
you used for parsing is just the Locale default format for that Locale from
getCurrencyInstance... but you can supply your own
[DecimalFormat|http://download.oracle.com/javase/1.5.0/docs/api/java/text/DecimalFormat.html]
pattern too. This is a printf/scanf like pattern that can contain special
characters, particularly ¤ (\u00A4): Currency sign, replaced by currency
symbol. If doubled, replaced by international currency symbol. If present in a
pattern, the monetary decimal separator is used instead of the decimal
separator.
Ideally here, you could allow this pattern to be a parameter.
bq. The NumberFormat parsing does not yield back the currency, just the value.
It seems the currency itself still needs to be extracted somehow. Is there a
built in mechanism to do this? Currently the patch iterates over all currencies
attempting to extract the symbol or code from the value.
Right, NumberFormat parses the actual number. Really its best if the currency
ISO code (e.g. USD) itself is supplied as a parameter, because these symbols
are not unique, for example $ is used for many currencies. I think this is
what Solr should do, if the end-user application doesn't know this somehow, it
can use more sophisticated mechanisms to "guess" it, particularly things like
ICU's "CurrencyMetaInfo" allow you to supply a "filter" based on things like
region, and timeframes, to get a list of the currencies used in that region at
that time.
bq. How important is it that users have control over the currencies table? It
was quite useful to have the ability to define fake currencies for testing (as
is done in the example currency.xml file), it seems that if I changed the
implementation to use Java's currency table this might be a limitation if
non-testing oriented use-cases exist.
I don't think its import to have fake currencies, but such things can be done
with the Locale SPI ... I think. I think we could just use real currencies for
testing.
> Money FieldType
> ---------------
>
> Key: SOLR-2202
> URL: https://issues.apache.org/jira/browse/SOLR-2202
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.5
> Reporter: Greg Fodor
> Attachments: SOLR-2202-lucene-1.patch, SOLR-2202-solr-1.patch,
> SOLR-2202-solr-2.patch
>
>
> Attached please find patches to add support for monetary values to
> Solr/Lucene with query-time currency conversion. The following features are
> supported:
> - Point queries (ex: "price:4.00USD")
> - Range quries (ex: "price:[$5.00 TO $10.00]")
> - Sorting.
> - Currency parsing by either currency code or symbol.
> - Symmetric & Asymmetric exchange rates. (Asymmetric exchange rates are
> useful if there are fees associated with exchanging the currency.)
> At indexing time, money fields can be indexed in a native currency. For
> example, if a product on an e-commerce site is listed in Euros, indexing the
> price field as "10.00EUR" will index it appropriately. By altering the
> currency.xml file, the sorting and querying against Solr can take into
> account fluctuations in currency exchange rates without having to re-index
> the documents.
> The new "money" field type is a polyfield which indexes two fields, one which
> contains the amount of the value and another which contains the currency code
> or symbol. The currency metadata (names, symbols, codes, and exchange rates)
> are expected to be in an xml file which is pointed to by the field type
> declaration in the schema.xml.
> The current patch is factored such that Money utility functions and
> configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig),
> while the MoneyType and MoneyValueSource lie in Solr. This was meant to
> mirror the work being done on the spacial field types.
> This patch has not yet been deployed to production but will be getting used
> to power the international search capabilities of the search engine at Etsy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]