Hi,

We are evaluating the deployment of Spotlight on EC2 instances. We
would like to know what performances should be expected, and maybe
what tweaks should be done to improve the speed?

Our interest is to extract key concepts from various texts. (English for start)

So far we have followed the "Run from a JAR" installation process
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Run-from-a-JAR
We are running the 6.5 jar version.
The server.properties file was grabbed from
https://raw.github.com/dbpedia-spotlight/dbpedia-spotlight/master/conf/server.properties.

We query /rest/annotate with the following params:
    disambiguator=Document
    confidence=0.3
    support=10
And Headers:
   Accept:application/json
   content-type:application/x-www-form-urlencoded

Requests are done from another EC2 instance to minimize bandwidth lag.
We have tweaked the -Xmx value to be (maxMem - 1GB)
>From a batch of 1000 texts with an average length of 3000 chars, we
have the following perfs:

 - m1.large (7.5GB, 2cores, 4EC2 compute units): 0.7texts/sec
 - m1.xlarge (15GB, 4cores, 8EC2 compute units): 13.7texts/sec
 - m2.xlarge (17.1GB, 2cores, 6.5EC2 compute units): 8.8texts/sec
 - m2.2xlarge (34.2GB, 4cores, 13EC2 compute units): 16.7texts/sec

Are these numbers similar to what should be expected?

>From the output of our Spotlight server it looks like the
disambiguation step is the most time consuming. Do you have any tips
for accelerating the disambiguation?

Thanks,

Marc



Here after is our server.properties config:

#### server.properties ####

org.dbpedia.spotlight.web.rest.uri = http://localhost:2222/rest
org.dbpedia.spotlight.default_namespace = http://dbpedia.org/resource/
org.dbpedia.spotlight.default_ontology= http://dbpedia.org/ontology/
org.dbpedia.spotlight.language = English
org.dbpedia.spotlight.language_i18n_code = en
org.dbpedia.spotlight.data.stopWords.english = /data/spotlight/stopwords.en.list
org.dbpedia.spotlight.spot.spotters =
LingPipeSpotter,WikiMarkupSpotter,AtLeastOneNounSelector,CoOccurrenceBasedSelector

# Path to serialized LingPipe dictionary used by LingPipeSpotter
org.dbpedia.spotlight.spot.dictionary =
/data/spotlight/surface_forms-Wikipedia-TitRedDis.thresh3.spotterDictionary
org.dbpedia.spotlight.spot.allowOverlap = false
org.dbpedia.spotlight.spot.caseSensitive = false

# Configurations for the CoOccurrenceBasedSelector
org.dbpedia.spotlight.spot.cooccurrence.datasource = ukwac
org.dbpedia.spotlight.spot.cooccurrence.database.jdbcdriver =
org.hsqldb.jdbcDriver
org.dbpedia.spotlight.spot.cooccurrence.database.connector =
jdbc:hsqldb:file:/data/spotlight/spotsel/ukwac_candidate;shutdown=true&readonly=true
org.dbpedia.spotlight.spot.cooccurrence.database.user = sa
org.dbpedia.spotlight.spot.cooccurrence.database.password =
org.dbpedia.spotlight.spot.cooccurrence.classifier.unigram =
/data/spotlight/spotsel/ukwac_unigram.model
org.dbpedia.spotlight.spot.cooccurrence.classifier.ngram =
/data/spotlight/spotsel/ukwac_ngram.model

# Path to serialized HMM model for LingPipe-based POS tagging.
Required by AtLeastOneNounSelector and CoOccurrenceBasedSelector
org.dbpedia.spotlight.tagging.hmm =
/data/spotlight/pos-en-general-brown.HiddenMarkovModel

org.dbpedia.spotlight.spot.opennlp.dir = /data/spotlight/3.7/opennlp
org.dbpedia.spotlight.spot.opennlp.location=http://dbpedia.org/ontology/Place

# From 
http://spotlight.dbpedia.org/download/release-0.5/candidate-index-full.tgz
org.dbpedia.spotlight.candidateMap.dir = /data/spotlight/candidateIndexTitRedDis
org.dbpedia.spotlight.candidateMap.loadToMemory = true

# List of disambiguators to load: Document,Occurrences,CuttingEdge,Default
org.dbpedia.spotlight.disambiguate.disambiguators = Document

# Path to a directory containing Lucene index files. These can be
downloaded from the website or created by
org.dbpedia.spotlight.lucene.index.IndexMergedOccurrences
org.dbpedia.spotlight.index.dir
=/data/spotlight/index-withSF-withTypes-compressed
org.dbpedia.spotlight.index.loadToMemory = false
org.dbpedia.spotlight.lucene.analyzer =
org.apache.lucene.analysis.en.EnglishAnalyzer
org.dbpedia.spotlight.lucene.version = LUCENE_36
jcs.default.cacheattributes.MaxObjects = 5000

# Configuration for SparqlFilter
org.dbpedia.spotlight.sparql.endpoint = http://dbpedia.org/sparql
org.dbpedia.spotlight.sparql.graph = http://dbpedia.org

#######################

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to