Re: [Dbp-spotlight-users] Performance is Slower when loading the index

Pablo N. Mendes Tue, 13 Nov 2012 10:49:32 -0800

What about this:

mvn scala:run -Dlauncher=Server "-DjavaOpts.Xmx=50G"
"-DaddArgs=../conf/server.properties"


And change:
org.dbpedia.spotlight.core.database = lucene

Also, let it run for a while until it "warms up". Prepare a few hundred
requests first, then measure the time.

Cheers,
Pablo



On Tue, Nov 13, 2012 at 6:22 PM, Essam Elsherif <[email protected]>wrote:

> I am using   mvn scala:run '-DaddArgs=../conf/server.properties'
>
> I increased the ram to 64G. Still performance is the same. Below is the
> requested info...
>
> ========================./config/server.properties=======================
> # Server hostname and port to be used by DBpedia Spotlight REST API
> org.dbpedia.spotlight.web.rest.uri = http://localhost:2222/rest
>
> # Internationalization (i18n) support -- work in progress
> org.dbpedia.spotlight.default_namespace = http://dbpedia.org/resource/
> org.dbpedia.spotlight.default_ontology= http://dbpedia.org/ontology/
> # Defines the languages the system should support.
> org.dbpedia.spotlight.language = English
> org.dbpedia.spotlight.language_i18n_code = en
> # Stop word list
> # An example can be downloaded from:
> http://spotlight.dbpedia.org/download/release-0.4/stopwords.en.list
> org.dbpedia.spotlight.data.stopWords.english =
> /data/spotlight/data/stopwords.en.list
> org.dbpedia.spotlight.data.stopWords.portuguese =
>  /data/spotlight/data/stopwords.pt.list
>
> #----- SPOTTING -------
>
> # Comma-separated list of spotters to load.
> # Accepted values are
> LingPipeSpotter,WikiMarkupSpotter,AtLeastOneNounSelector,CoOccurrenceBasedSelector,NESpotter,OpenNLPNGramSpotter,OpenNLPChunkerSpotter,KeaSpotter
> # Some spotters may require extra files and config parameters. See
> org.dbpedia.spotlight.model.SpotterConfiguration
> org.dbpedia.spotlight.spot.spotters = LingPipeSpotter,WikiMarkupSpotter
> org.dbpedia.spotlight.spot.selectors = ShortSurfaceFormSelector
>
> # Path to serialized LingPipe dictionary used by LingPipeSpotter
> org.dbpedia.spotlight.spot.dictionary =
> /data/spotlight/data/compact/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary
> #org.dbpedia.spotlight.spot.allowOverlap = false
> #org.dbpedia.spotlight.spot.caseSensitive = true
>
> # Configurations for the CoOccurrenceBasedSelector
> # From:
> http://spotlight.dbpedia.org/download/release-0.5/spot_selector.tgz
> org.dbpedia.spotlight.spot.cooccurrence.datasource = ukwac
> org.dbpedia.spotlight.spot.cooccurrence.database.jdbcdriver =
> org.hsqldb.jdbcDriver
> org.dbpedia.spotlight.spot.cooccurrence.database.connector =
> jdbc:hsqldb:file:/data/spotlight/data/spotsel/ukwac_candidate.script;shutdown=true&readonly=true
> org.dbpedia.spotlight.spot.cooccurrence.database.user = sa
> org.dbpedia.spotlight.spot.cooccurrence.database.password =
> org.dbpedia.spotlight.spot.cooccurrence.classifier.unigram =
>  /data/spotlight/data/spotsel/ukwac_unigram.model
> org.dbpedia.spotlight.spot.cooccurrence.classifier.ngram =
>  /data/spotlight/data/spotsel/ukwac_ngram.model
>
> # Path to serialized HMM model for LingPipe-based POS tagging. Required by
> AtLeastOneNounSelector and CoOccurrenceBasedSelector
> org.dbpedia.spotlight.tagging.hmm =
> /data/spotlight/data/pos-en-general-brown.HiddenMarkovModel
> # Path to dir containing several OpenNLP models for NER, chunking, etc.
> This is required for spotters that are based on OpenNLP.
> # Can be downloaded from
> http://spotlight.dbpedia.org/download/release-0.5/opennlp_models.tgz
> org.dbpedia.spotlight.spot.opennlp.dir =
> /data/spotlight/data/data//spotlight/3.7/opennlp
> org.dbpedia.spotlight.spot.opennlp.person=
> http://dbpedia.org/ontology/Person
> org.dbpedia.spotlight.spot.opennlp.organization=
> http://dbpedia.org/ontology/Organisation
> org.dbpedia.spotlight.spot.opennlp.location=
> http://dbpedia.org/ontology/Place
>
>
> # EXPERIMENTAL! Path to Kea Model
> org.dbpedia.spotlight.spot.kea.model =
> /data/spotlight/3.7/kea/keaModel-1-3-1
>
>
> #----- CANDIDATE SELECTION -------
>
> # Choose between jdbc or lucene for DBpedia Resource creation. Also, if
> the jdbc throws an error, lucene will be used.
> org.dbpedia.spotlight.core.database = jdbc
> org.dbpedia.spotlight.core.database.jdbcdriver = org.hsqldb.jdbcDriver
> org.dbpedia.spotlight.core.database.connector =
> jdbc:hsqldb:file:/data/spotlight/data/dbpedia-spotlight-db;shutdown=true&readonly=true
> org.dbpedia.spotlight.core.database.user = sa
> org.dbpedia.spotlight.core.database.password =
>
> # From
> http://spotlight.dbpedia.org/download/release-0.5/candidate-index-full.tgz
> org.dbpedia.spotlight.candidateMap.dir =
> /data/spotlight/data/candidateIndexTitRedDis
> org.dbpedia.spotlight.candidateMap.loadToMemory = true
> # Path to Lucene index containing only the candidate map. It is used by
> document-oriented disambiguators such as Document,TwoStepDisambiguator
> # Only used if one such disambiguator is loaded. Data is at:
> http://spotlight.dbpedia.org/download/release-0.5/candidate-index-full.tgz
> #org.dbpedia.spotlight.candidateMap.dir =
> dist/src/deb/control/data/usr/share/dbpedia-spotlight/index
>
>
> #----- DISAMBIGUATION -------
>
> # List of disambiguators to load: Document,Occurrences,CuttingEdge,Default
> org.dbpedia.spotlight.disambiguate.disambiguators =  Occurrences
>
> # Path to a directory containing Lucene index files. These can be
> downloaded from the website or created by
> org.dbpedia.spotlight.lucene.index.IndexMergedOccurrences
> org.dbpedia.spotlight.index.dir
> =/dev/shm/temp/medium/index-withSF-withTypes-compressed
> # Will attempt to load into RAM (the potentially huge) index from
> "org.dbpedia.spotlight.index.dir"
> org.dbpedia.spotlight.index.loadToMemory =  true
> # Class used to process context around DBpedia mentions (tokenize, stem,
> etc.)
> org.dbpedia.spotlight.lucene.analyzer =
> org.apache.lucene.analysis.en.EnglishAnalyzer
> org.dbpedia.spotlight.lucene.version = LUCENE_36
> # How large can the cache be for ICFDisambiguator.
> jcs.default.cacheattributes.MaxObjects = 15000
>
>
> #----- LINKING / FILTERING  -------
>
> # Configuration for SparqlFilter
> org.dbpedia.spotlight.sparql.endpoint = http://dbpedia.org/sparql
> org.dbpedia.spotlight.sparql.graph = http://dbpedia.org
>
>
> ===========End=============./config/server.properties=======================
>
> ===========Begin=============free -m -t=================================
> [aelshes@nj2utaepxapp01 dbpedia-spotlight]$ free -m -t
>              total       used       free     shared    buffers     cached
> Mem:         64458      56447       8011          0        106      26498
> -/+ buffers/cache:      29842      34616
> Swap:         1983       1549        433
> Total:       66442      57997       8445
>
> ===========End=============free -m -t=================================
>
> ===========Begin=============vmstat 5=================================
> [aelshes@nj2utaepxapp01 dbpedia-spotlight]$ vmstat 5
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa st
>  1  0 1587196 8511184 109284 27134160    2    3   129   153   29   21  1
>  0 98  0  0
>  1  0 1587196 8511184 109288 27134160    0    0     0    18 1015  490 25
>  0 75  0  0
>  1  0 1587196 8845116 109296 27134160    0    0     0     9 1010  499 26
>  0 74  0  0
>  2  0 1587196 9149660 109308 27134160    0    0     0     7 1015  559 26
>  0 73  0  0
>  1  0 1587196 9469704 109308 27134160    0    0     0     0 1011  538 27
>  0 73  0  0
>  1  0 1587196 9469580 109312 27134160    0    0     0    31 1014  491 25
>  0 75  0  0
>  1  0 1587196 9761352 109324 27134164    0    0     0    38 1016  590 26
>  0 74  0  0
>  1  0 1587196 10062044 109328 27134164    0    0     0     3 1017  487 26
>  0 73  0  0
>  1  0 1587196 10333364 109328 27134164    0    0     0     4 1015  483 26
>  0 74  0  0
>  1  0 1587196 10333364 109340 27134164    0    0     0    13 1012  471 25
>  0 75  0  0
>  0  0 1587196 10609884 109348 27134164    0    0     0     6 1021  514 24
>  0 76  0  0
>  0  0 1587196 10609884 109352 27134164    0    0     0     9 1010  491  0
>  0 100  0  0
> ===========End=============vmstat 5=================================
>
>   ------------------------------
> *From:* Pablo N. Mendes <[email protected]>
> *To:* Essam Elsherif <[email protected]>
> *Cc:* "[email protected]" <
> [email protected]>
> *Sent:* Tuesday, November 13, 2012 12:02 PM
>
> *Subject:* Re: [Dbp-spotlight-users] Performance is Slower when loading
> the index
>
>
> Can you share your config files, the command line that you are using to
> run it, and the output of vmstat or "free -m -t" while the system is
> running?
>
>
> On Tue, Nov 13, 2012 at 5:39 PM, Essam Elsherif 
> <[email protected]>wrote:
>
> I successfully loaded the compact index and candidate map to a 32G memory
> machine with 2 CPUs. Running the build from the trunk, my instance is still
> three times slower than the live spotlight version. Any clue what reason
> could be?
>
> Any help is appreciated.
>
> Thanks,
> Essam
>
>    ------------------------------
> *From:* Essam Elsherif <[email protected]>
> *To:* Pablo N. Mendes <[email protected]>
> *Cc:* "[email protected]" <
> [email protected]>
> *Sent:* Wednesday, November 7, 2012 2:05 PM
>
> *Subject:* Re: [Dbp-spotlight-users] Performance is Slower when loading
> the index
>
> Yes, around 24G is being used during annotation. Below is the output from
> top..
>
> I do not have conf under rest.
>
> I used the command below. Still performance is slow.
>
> Thanks,
> Essam
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 26683 root      19   0 30.6g  20g  20m S 198.8 63.7  22:51.34 java
>     1 root      15   0 10364  564  528 S  0.0  0.0   0:00.74 init
>     2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.18 migration/0
>     3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
>     4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.13 migration/1
>
>
>   ------------------------------
> *From:* Pablo N. Mendes <[email protected]>
> *To:* Essam Elsherif <[email protected]>
> *Cc:* "[email protected]" <
> [email protected]>
> *Sent:* Wednesday, November 7, 2012 1:02 PM
> *Subject:* Re: [Dbp-spotlight-users] Performance is Slower when loading
> the index
>
>
> Do you observe that about 24G is filled when annotation is running? If
> not, you might have not successfully configured the server.properties to
> load things to memory.
>
> From your command line it seems you have a "conf" directory under the
> "rest" module in addition to the original "conf" that sits on the project
> root? Did you also try:
>
> cd rest
> mvn scala:run -Dlauncher=Server "-DjavaOpts.Xmx=26G"
> "-DaddArgs=../conf/server.properties"
>
>
> Cheers,
> Pablo
>
>
> On Wed, Nov 7, 2012 at 6:56 PM, Essam Elsherif 
> <[email protected]>wrote:
>
> I have 32G of memory. I set the -Xmx in rest/pom.xml to 26G.
>
> I am using mvn scala:run '-DaddArgs=./conf/server.properties' to run the
> server
>
> free -m shows most of the 2G swap is free while running the annotation.
>
>
> Thanks,
> Essam
>
>    ------------------------------
> *From:* Pablo N. Mendes <[email protected]>
> *To:* Essam Elsherif <[email protected]>
> *Cc:* "[email protected]" <
> [email protected]>
> *Sent:* Wednesday, November 7, 2012 11:42 AM
> *Subject:* Re: [Dbp-spotlight-users] Performance is Slower when loading
> the index
>
>
> Perhaps the system is paging? How much swap is used when the system is
> running?
> How much memory do you have?
> What command line did you use?
> What is the -Xmx specified in your pom.xml?
> Cheers
> pablo
> On Nov 7, 2012 10:21 AM, "Essam Elsherif" <[email protected]>
> wrote:
>
> Hi,
> I built spotlight from the latest source and I am trying to run it on 32G
> Ram server. When I load the compact index
> "index-withSF-withTypes-compressed " into memory annotation is very much
> slower than not loading. Both are slower than the live spotlight anyway.
> Any idea what is going on here?
>
> Thanks,
> essam
>
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
>
>
>
>
>
> --
> ---
> Pablo N. Mendes
> http://pablomendes.com
> Events: http://wole2012.eurecom.fr/
>
>
>
>
>
>
>
>
> --
> ---
> Pablo N. Mendes
> http://pablomendes.com
> Events: http://wole2012.eurecom.fr
>
>
>
>


-- 
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] Performance is Slower when loading the index

Reply via email to