[ 
https://issues.apache.org/jira/browse/NUTCH-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051130#comment-18051130
 ] 

ASF GitHub Bot commented on NUTCH-3064:
---------------------------------------

lewismc commented on PR #825:
URL: https://github.com/apache/nutch/pull/825#issuecomment-3736568735

   This PR now upgrades the `index-geoip` plugin to use MaxMind GeoIP2 Java API 
5.0.2, with significant architectural improvements including support for 
multiple database types and in-memory caching.
   
   ## Changes
   
   ### Dependency Updates
   
   - `geoip2`: upgraded to **5.0.2**
   - `maxmind-db`: upgraded to **4.0.2**
   - `jackson-datatype-jsr310`: added **2.20.1** (new transitive dependency)
   
   ### Performance Improvement — CHMCache
   
   Database readers now use `CHMCache` (ConcurrentHashMap Cache) from the 
maxmind-db library for improved lookup performance:
   
   ```java
   DatabaseReader reader = new DatabaseReader.Builder(db)
       .withCache(new CHMCache())
       .build();
   ```
   
   This caches parsed database nodes in memory, reducing disk I/O and improving 
throughput when the same IP prefixes are queried repeatedly during indexing.
   
   ### New Configuration Options in `conf/nutch-default.xml`
   
   The plugin now supports multiple database types simultaneously. Configure 
each by setting its file path:
   
   | Property | Description |
   |----------|-------------|
   | `index.geoip.db.anonymous` | Anonymous IP database — identifies VPNs, 
proxies, Tor exit nodes |
   | `index.geoip.db.asn` | ASN database — autonomous system number and 
organization |
   | `index.geoip.db.city` | City database — city, subdivision, country, 
continent, coordinates |
   | `index.geoip.db.connection` | Connection Type database — Cable/DSL, 
Cellular, Corporate, Satellite |
   | `index.geoip.db.domain` | Domain database — second-level domain for the IP 
|
   | `index.geoip.db.isp` | ISP database — ISP name, organization, ASN |
   
   ### MaxMind Insights Web Service Support
   
   | Property | Description |
   |----------|-------------|
   | `index.geoip.insights.userid` | User ID for MaxMind Precision Insights API 
|
   | `index.geoip.insights.licensekey` | License key for the Insights API |
   
   ### Architecture Improvements
   
   - Refactored to support multiple databases via `EnumMap<DatabaseType, 
DatabaseReader>`
   - Each database type is loaded independently and queried in sequence
   - Proper resource cleanup via `Closeable` implementation
   - Graceful error handling per-database (one failure doesn't block others)
   
   ## Files Modified
   
   - `src/plugin/index-geoip/` — plugin source, tests, dependencies, and config
   - `build.xml` — root build configuration
   - `conf/nutch-default.xml` — new GeoIP configuration properties
   - `src/plugin/build.xml` — plugin build configuration
   - `src/plugin/indexer-solr/schema.xml` — Solr schema field definitions
   




> Upgrade index-geoip to GeoIP2 5.0.2
> -----------------------------------
>
>                 Key: NUTCH-3064
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3064
>             Project: Nutch
>          Issue Type: Task
>          Components: index-geoip, plugin
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.22
>
>
> A recent mailing list question about the index-geoip plugin prompted me to 
> take a look at it and perform any necessary maintenance. 
> As of writing, the latest dependency can be found at 
> [https://central.sonatype.com/artifact/com.maxmind.geoip2/geoip2] at v4.2.0.
> At a minimum this ticket will accomplish the dependency update. I'll also 
> have a look at documentation and maybe provide some unit tests... which I 
> neglected to furnish last time around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to