[
https://issues.apache.org/jira/browse/UIMA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693013#comment-13693013
]
Mike Barborak commented on UIMA-3017:
-------------------------------------
Hi,
I was using this non-JCas approach to try to simplify the test case. Your
observation shows that I simplified it too much. So I have changed the test
case to be a bit more complex and closer to reality. Here are the results:
Test 1. Get a string from an FS using non-JCas API
round 0 total time 1: 6.980455698s
round 1 total time 1: 6.739509176s
round 2 total time 1: 6.732438268s
round 3 total time 1: 6.738521584s
round 4 total time 1: 6.73231265s
Test 2. Get a string from a POJO via an internal HashMap<String, String>
round 0 total time 2: 0.93895504s
round 1 total time 2: 0.916541992s
round 2 total time 2: 0.918598733s
round 3 total time 2: 0.912975135s
round 4 total time 2: 0.899861037s
Test 3. Get a string from an FS using JCAS API
round 0 total time 3: 1.174062393s
round 1 total time 3: 1.159312484s
round 2 total time 3: 1.158318541s
round 3 total time 3: 1.158692163s
round 4 total time 3: 1.158886561s
Test 4. Get a member string from a POJO
round 0 total time 4: 0.072869196s
round 1 total time 4: 0.067882641s
round 2 total time 4: 0.067807927s
round 3 total time 4: 0.067737007s
round 4 total time 4: 0.067756583s
Test 5. Get a FS reference using the JCas API
round 0 total time 5: 10.611032036s
round 1 total time 5: 10.081770229s
round 2 total time 5: 10.143019053s
round 3 total time 5: 11.222992977s
round 4 total time 5: 10.186875851s
Test 6. Get a FS reference using the JCas API and caching
jcasType.ll_cas.ll_getFSForRef in a HashMap
round 0 total time 6: 2.257271104s
round 1 total time 6: 2.232764892s
round 2 total time 6: 2.231799763s
round 3 total time 6: 2.234257571s
round 4 total time 6: 2.237339784s
First, I added code to reproduce your new set of times. This is test 3. As you
can see, I didn't get your times likely because I didn't reproduce your test.
But I actually think that comparing test 2 and 3 isn't as interesting as some
other comparisons I make below.
Second, I added a string member to my POJO and then got this member through a
method in a tight loop. This is test 4 and this is what I would expect to be
similar to Test 3 (rather than test 2) but it is about 16 times faster.
Third, I added a feature to my type that is a reference to a feature structure.
I timed getting the referenced feature structure in a tight loop. This is time
5.
Fourth, I extended the MyType class generated with JCasGen as WrappedMyType.
This class uses a HashMap<Integer, FeatureStructure> to cache the results of
ll_getFSForRef(). I timed getting the referenced feature structure with this
class in a tight loop and this is time 6. I would expect it to be similar to
Test 5 but it is about 4 times faster.
Thank you for considering these numbers.
Here is my test class:
{code:title=test/MyTest.java}
package test;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import org.apache.uima.UIMAFramework;
import org.apache.uima.cas.CAS;
import org.apache.uima.cas.Feature;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.Type;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.metadata.TypeSystemDescription;
import org.apache.uima.util.CasCreationUtils;
import org.apache.uima.util.XMLInputSource;
import com.ibm.uima.matching.TestSupport;
public class MyTest {
static class MyClass {
String memberString;
Map<String, String> myFeatures = new HashMap<String, String>();
void setStringValue(String feature, String value) {
myFeatures.put(feature, value);
}
String getStringValue(String feature) {
return myFeatures.get(feature);
}
void setMemberStringValue(String value) {
memberString = value;
}
String getMemberStringValue() {
return memberString;
}
}
static class WrappedMyType extends MyType {
private Map<Integer, FeatureStructure> getFSForRefCache = new
HashMap<Integer, FeatureStructure>();
public WrappedMyType(JCas jCas) {
super(jCas);
}
protected FeatureStructure getFSForRef(int fsRef) {
if (fsRef == 0) return null;
FeatureStructure fs = getFSForRefCache.get(fsRef);
if (fs == null) {
fs = jcasType.ll_cas.ll_getFSForRef(fsRef);
getFSForRefCache.put(fsRef, fs);
}
return fs;
}
@Override
public MyType getMyTypeRef() {
if (MyType_Type.featOkTst && ((MyType_Type)jcasType).casFeat_myTypeRef ==
null)
jcasType.jcas.throwFeatMissing("myTypeRef", "test.MyType");
int refAddr = jcasType.ll_cas.ll_getRefValue(addr,
((MyType_Type)jcasType).casFeatCode_myTypeRef);
MyType result = (MyType) getFSForRef(refAddr);
return result;
}
}
static public void main(String[] argv) throws Exception {
InputStream stream =
TestSupport.class.getClassLoader().getResourceAsStream("MyTypes.xml");
TypeSystemDescription typeSystemDescription =
UIMAFramework.getXMLParser().parseTypeSystemDescription(new
XMLInputSource(stream, null));
CAS cas = CasCreationUtils.createCas(typeSystemDescription, null, null);
Type myType = cas.getTypeSystem().getType("test.MyType");
FeatureStructure fs = cas.createFS(myType);
Feature myFeature = myType.getFeatureByBaseName("myFeature");
fs.setStringValue(myFeature, "myString");
cas.addFsToIndexes(fs);
MyClass myInstance = new MyClass();
myInstance.setStringValue("myFeature2", "myString2");
myInstance.setMemberStringValue("myMemberString2");
MyType myType2 = new MyType(cas.getJCas());
myType2.setMyFeature("myString3");
myType2.setMyTypeRef(myType2);
myType2.addToIndexes();
WrappedMyType wrappedMyType2 = new WrappedMyType(cas.getJCas());
wrappedMyType2.setMyFeature("myString4");
wrappedMyType2.setMyTypeRef(myType2);
wrappedMyType2.addToIndexes();
long iterations = 100000000;
long rounds = 5;
double nanoSecsPerSec = 1000000000.0d;
System.out.println("Test 1. Get a string from an FS using non-JCas API");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
fs.getStringValue(myFeature);
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 1: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
System.out.println("\nTest 2. Get a string from a POJO via an internal
HashMap<String, String>");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
myInstance.getStringValue("myFeature2");
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 2: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
System.out.println("\nTest 3. Get a string from an FS using JCAS API");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
myType2.getMyFeature();
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 3: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
System.out.println("\nTest 4. Get a member string from a POJO");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
myInstance.getMemberStringValue();
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 4: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
System.out.println("\nTest 5. Get a FS reference using the JCas API");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
myType2.getMyTypeRef();
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 5: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
System.out.println("\nTest 6. Get a FS reference using the JCas API and
caching jcasType.ll_cas.ll_getFSForRef in a HashMap");
for (int round = 0; round < rounds; round++) {
long start = System.nanoTime();
for (long i = 0; i < iterations; i++) {
wrappedMyType2.getMyTypeRef();
}
long end = System.nanoTime();
System.out.println("round " + round + " total time 6: " + ((end - start)
/ nanoSecsPerSec) + "s");
}
}
}
{code}
Here is my type system:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
<name>MyTypes</name>
<description/>
<version>1.0</version>
<vendor/>
<types>
<typeDescription>
<name>test.MyType</name>
<description/>
<supertypeName>uima.cas.TOP</supertypeName>
<features>
<featureDescription>
<name>myFeature</name>
<description/>
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
<featureDescription>
<name>myTypeRef</name>
<description/>
<rangeTypeName>test.MyType</rangeTypeName>
</featureDescription>
</features>
</typeDescription>
</types>
</typeSystemDescription>
{code}
Here is my TOP extension generated by JCasGen:
{code:title=test/MyType.java}
/* First created by JCasGen Tue Jun 25 08:23:54 EDT 2013 */
package test;
import org.apache.uima.jcas.JCas;
import org.apache.uima.jcas.JCasRegistry;
import org.apache.uima.jcas.cas.TOP_Type;
import org.apache.uima.jcas.cas.TOP;
/**
* Updated by JCasGen Tue Jun 25 08:23:54 EDT 2013
* XML source:
/home/mabarbor/interactive_wics_wksp/com.ibm.uima.matching/descriptors/MyTypes.xml
* @generated */
public class MyType extends TOP {
/** @generated
* @ordered
*/
@SuppressWarnings ("hiding")
public final static int typeIndexID = JCasRegistry.register(MyType.class);
/** @generated
* @ordered
*/
@SuppressWarnings ("hiding")
public final static int type = typeIndexID;
/** @generated */
@Override
public int getTypeIndexID() {return typeIndexID;}
/** Never called. Disable default constructor
* @generated */
protected MyType() {/* intentionally empty block */}
/** Internal - constructor used by generator
* @generated */
public MyType(int addr, TOP_Type type) {
super(addr, type);
readObject();
}
/** @generated */
public MyType(JCas jcas) {
super(jcas);
readObject();
}
/** <!-- begin-user-doc -->
* Write your own initialization here
* <!-- end-user-doc -->
@generated modifiable */
private void readObject() {/*default - does nothing empty block */}
//*--------------*
//* Feature: myFeature
/** getter for myFeature - gets
* @generated */
public String getMyFeature() {
if (MyType_Type.featOkTst && ((MyType_Type)jcasType).casFeat_myFeature ==
null)
jcasType.jcas.throwFeatMissing("myFeature", "test.MyType");
return jcasType.ll_cas.ll_getStringValue(addr,
((MyType_Type)jcasType).casFeatCode_myFeature);}
/** setter for myFeature - sets
* @generated */
public void setMyFeature(String v) {
if (MyType_Type.featOkTst && ((MyType_Type)jcasType).casFeat_myFeature ==
null)
jcasType.jcas.throwFeatMissing("myFeature", "test.MyType");
jcasType.ll_cas.ll_setStringValue(addr,
((MyType_Type)jcasType).casFeatCode_myFeature, v);}
//*--------------*
//* Feature: myTypeRef
/** getter for myTypeRef - gets
* @generated */
public MyType getMyTypeRef() {
if (MyType_Type.featOkTst && ((MyType_Type)jcasType).casFeat_myTypeRef ==
null)
jcasType.jcas.throwFeatMissing("myTypeRef", "test.MyType");
return
(MyType)(jcasType.ll_cas.ll_getFSForRef(jcasType.ll_cas.ll_getRefValue(addr,
((MyType_Type)jcasType).casFeatCode_myTypeRef)));}
/** setter for myTypeRef - sets
* @generated */
public void setMyTypeRef(MyType v) {
if (MyType_Type.featOkTst && ((MyType_Type)jcasType).casFeat_myTypeRef ==
null)
jcasType.jcas.throwFeatMissing("myTypeRef", "test.MyType");
jcasType.ll_cas.ll_setRefValue(addr,
((MyType_Type)jcasType).casFeatCode_myTypeRef,
jcasType.ll_cas.ll_getFSRef(v));}
}
{code}
And here is the TOP_Type extension created by JCasGen:
{code:title=test/MyType_Type.java}
/* First created by JCasGen Tue Jun 25 08:23:54 EDT 2013 */
package test;
import org.apache.uima.jcas.JCas;
import org.apache.uima.jcas.JCasRegistry;
import org.apache.uima.cas.impl.CASImpl;
import org.apache.uima.cas.impl.FSGenerator;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.impl.TypeImpl;
import org.apache.uima.cas.Type;
import org.apache.uima.cas.impl.FeatureImpl;
import org.apache.uima.cas.Feature;
import org.apache.uima.jcas.cas.TOP_Type;
/**
* Updated by JCasGen Tue Jun 25 08:23:54 EDT 2013
* @generated */
public class MyType_Type extends TOP_Type {
/** @generated */
@Override
protected FSGenerator getFSGenerator() {return fsGenerator;}
/** @generated */
private final FSGenerator fsGenerator =
new FSGenerator() {
public FeatureStructure createFS(int addr, CASImpl cas) {
if (MyType_Type.this.useExistingInstance) {
// Return eq fs instance if already created
FeatureStructure fs =
MyType_Type.this.jcas.getJfsFromCaddr(addr);
if (null == fs) {
fs = new MyType(addr, MyType_Type.this);
MyType_Type.this.jcas.putJfsFromCaddr(addr, fs);
return fs;
}
return fs;
} else return new MyType(addr, MyType_Type.this);
}
};
/** @generated */
@SuppressWarnings ("hiding")
public final static int typeIndexID = MyType.typeIndexID;
/** @generated
@modifiable */
@SuppressWarnings ("hiding")
public final static boolean featOkTst =
JCasRegistry.getFeatOkTst("test.MyType");
/** @generated */
final Feature casFeat_myFeature;
/** @generated */
final int casFeatCode_myFeature;
/** @generated */
public String getMyFeature(int addr) {
if (featOkTst && casFeat_myFeature == null)
jcas.throwFeatMissing("myFeature", "test.MyType");
return ll_cas.ll_getStringValue(addr, casFeatCode_myFeature);
}
/** @generated */
public void setMyFeature(int addr, String v) {
if (featOkTst && casFeat_myFeature == null)
jcas.throwFeatMissing("myFeature", "test.MyType");
ll_cas.ll_setStringValue(addr, casFeatCode_myFeature, v);}
/** @generated */
final Feature casFeat_myTypeRef;
/** @generated */
final int casFeatCode_myTypeRef;
/** @generated */
public int getMyTypeRef(int addr) {
if (featOkTst && casFeat_myTypeRef == null)
jcas.throwFeatMissing("myTypeRef", "test.MyType");
return ll_cas.ll_getRefValue(addr, casFeatCode_myTypeRef);
}
/** @generated */
public void setMyTypeRef(int addr, int v) {
if (featOkTst && casFeat_myTypeRef == null)
jcas.throwFeatMissing("myTypeRef", "test.MyType");
ll_cas.ll_setRefValue(addr, casFeatCode_myTypeRef, v);}
/** initialize variables to correspond with Cas Type and Features
* @generated */
public MyType_Type(JCas jcas, Type casType) {
super(jcas, casType);
casImpl.getFSClassRegistry().addGeneratorForType((TypeImpl)this.casType,
getFSGenerator());
casFeat_myFeature = jcas.getRequiredFeatureDE(casType, "myFeature",
"uima.cas.String", featOkTst);
casFeatCode_myFeature = (null == casFeat_myFeature) ?
JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_myFeature).getCode();
casFeat_myTypeRef = jcas.getRequiredFeatureDE(casType, "myTypeRef",
"test.MyType", featOkTst);
casFeatCode_myTypeRef = (null == casFeat_myTypeRef) ?
JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_myTypeRef).getCode();
}
}
{code}
> Getting feature value from feature structure longer than expected
> -----------------------------------------------------------------
>
> Key: UIMA-3017
> URL: https://issues.apache.org/jira/browse/UIMA-3017
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.3
> Environment: Linux x86_64
> Reporter: Mike Barborak
> Priority: Minor
>
> Should getting a value of a feature in a feature structure be fast?
> Intuitively, I would expect performance to be about the same as getting an
> entry from a Java HashMap or faster but in my experiments it seems to be 8
> times slower. To solve my problem, I wrap my feature structures with caching
> Java code but it seems that there might be an opportunity to speed up UIMA
> generally.
> My test creates a CAS with a single feature structure in it. It sets a string
> feature in that feature structure and then simply gets the value of that
> feature in a tight loop. I compare that to an instance of a Java class that
> has an internal HashMap of strings to strings. In that case, a method is
> called on that instance to get an entry from the map in a very tight loop.
> I do 5 rounds of each of the loops. The total times for the rounds involving
> the CAS were:
> round 0 total time 1: 7.520104509s
> round 1 total time 1: 6.812214938s
> round 2 total time 1: 6.882752307s
> round 3 total time 1: 6.728515004s
> round 4 total time 1: 6.813674956s
> The total times for the rounds just using the Java class were:
> round 0 total time 2: 0.847296054s
> round 1 total time 2: 0.814570347s
> round 2 total time 2: 0.814399859s
> round 3 total time 2: 0.814189383s
> round 4 total time 2: 0.814979357s
> Here is my Java code:
> {code:title=MyTest.java}
> package test;
> import java.io.InputStream;
> import java.util.HashMap;
> import java.util.Map;
> import org.apache.uima.UIMAFramework;
> import org.apache.uima.cas.CAS;
> import org.apache.uima.cas.Feature;
> import org.apache.uima.cas.FeatureStructure;
> import org.apache.uima.cas.Type;
> import org.apache.uima.resource.metadata.TypeSystemDescription;
> import org.apache.uima.util.CasCreationUtils;
> import org.apache.uima.util.XMLInputSource;
> public class MyTest {
>
> static class MyClass {
> Map<String, String> myFeatures = new HashMap<String, String>();
>
> void setStringValue(String feature, String value) {
> myFeatures.put(feature, value);
> }
>
> String getStringValue(String feature) {
> return myFeatures.get(feature);
> }
> }
>
> static public void main(String[] argv) throws Exception {
> InputStream stream =
> TestSupport.class.getClassLoader().getResourceAsStream("MyTypes.xml");
> TypeSystemDescription typeSystemDescription =
> UIMAFramework.getXMLParser().parseTypeSystemDescription(new
> XMLInputSource(stream, null));
> CAS cas = CasCreationUtils.createCas(typeSystemDescription, null, null);
> Type myType = cas.getTypeSystem().getType("MyType");
> FeatureStructure fs = cas.createFS(myType);
> Feature myFeature = myType.getFeatureByBaseName("myFeature");
> fs.setStringValue(myFeature, "myString");
> cas.addFsToIndexes(fs);
>
> MyClass myInstance = new MyClass();
> myInstance.setStringValue("myFeature2", "myString2");
>
> long iterations = 100000000;
> double nanoSecsPerSec = 1000000000.0d;
>
> for (int round = 0; round < 5; round++) {
> long start = System.nanoTime();
> for (long i = 0; i < iterations; i++) {
> fs.getStringValue(myFeature);
> }
> long end = System.nanoTime();
> System.out.println("round " + round + " total time 1: " + ((end -
> start) / nanoSecsPerSec) + "s");
> }
>
> for (int round = 0; round < 5; round++) {
> long start = System.nanoTime();
> for (long i = 0; i < iterations; i++) {
> myInstance.getStringValue("myFeature2");
> }
> long end = System.nanoTime();
> System.out.println("round " + round + " total time 2: " + ((end -
> start) / nanoSecsPerSec) + "s");
> }
> }
> }
> {code}
> Here is my type descriptor:
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
> <name>MyTypes</name>
> <description/>
> <version>1.0</version>
> <vendor/>
> <types>
> <typeDescription>
> <name>MyType</name>
> <description/>
> <supertypeName>uima.cas.TOP</supertypeName>
> <features>
> <featureDescription>
> <name>myFeature</name>
> <description></description>
> <rangeTypeName>uima.cas.String</rangeTypeName>
> </featureDescription>
> </features>
> </typeDescription>
> </types>
> </typeSystemDescription>
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira