Hello,
Here is the Python code.
The code works as expected most of the time.
Regards,
William Johnston
From: Slide
Sent: Wednesday, December 16, 2015 6:19 PM
To: William Johnston ; [email protected]
Subject: Re: [Ironpython-users] debug C# IronPython app?
Can you give the code for the tagger method?
On Wed, Dec 16, 2015 at 4:13 PM William Johnston <[email protected]> wrote:
Hello,
Here it is:
private dynamic textrazortagger = runtime.UseFile(@"E:\Users\William
Johnston\Documents\Visual Studio 2010\Projects\Visual Studio
Projects\TextRazor\pos.py");
Sincerely,
William Johnston
From: Slide
Sent: Wednesday, December 16, 2015 5:37 PM
To: William Johnston ; [email protected]
Subject: Re: [Ironpython-users] debug C# IronPython app?
Forgive my ignorance, but what is textrazortagger?
On Wed, Dec 16, 2015 at 2:23 PM William Johnston <[email protected]>
wrote:
Hello,
Here is my code:
public List<MyPythonTuple> TextRazorTagger(string str)
{
List<MyPythonTuple> ret = new List<MyPythonTuple>();
try
{
IronPython.Runtime.List results = textrazortagger.tagger(str);
foreach (IronPython.Runtime.PythonTuple tuple in results)
{
string strWord = (string)tuple[0];
string strPos = (string)tuple[1];
if (strPos.Length == 0)
{
continue;
}
MyPythonTuple myResult = new MyPythonTuple();
myResult.Word = strWord;
myResult.Pos = strPos;
ret.Add(myResult);
}
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
return ret;
}
tuple1 returns an empty string.
Thanks,
William Johnston
From: Slide
Sent: Tuesday, December 15, 2015 5:13 PM
To: William Johnston ; [email protected]
Subject: Re: [Ironpython-users] debug C# IronPython app?
Microsoft isn't really involved with IronPython anymore, it's a completely
open source project with no developers from MS really spending time on it (Dino
does help out). Providing some code might allow someone on this list to help
out.
slide
On Tue, Dec 15, 2015 at 1:53 PM William Johnston <[email protected]>
wrote:
Hello,
A C# DLR app is not returning results correctly. (A part of speech
tagger is returning an empty string for the actual pos for certain strings.)
(The second PythonTuple value from an IronPython List is empty.)
Howver, the Python script does run from a Python Shell.
How would I go about debugging the app?
Do you know if Microsoft provides paid support?
Thanks.
Regards,
William Johnston
_______________________________________________
Ironpython-users mailing list
[email protected]
https://mail.python.org/mailman/listinfo/ironpython-users
_______________________________________________
Ironpython-users mailing list
[email protected]
https://mail.python.org/mailman/listinfo/ironpython-users
_______________________________________________
Ironpython-users mailing list
[email protected]
https://mail.python.org/mailman/listinfo/ironpython-users
from textrazor import TextRazor
# basic instantiation. TODO Put your authentication keys here.
client = TextRazor("", extractors=["words"])
def tagger(text) :
response = client.analyze(text)
words = response.words()
seen = []
for word in words:
print word.token, word.part_of_speech
seen.append((word.token, word.part_of_speech))
return seen
"""
Copyright (c) 2014 TextRazor, http://textrazor.com/
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the Software
is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
try:
from urllib2 import Request, urlopen, HTTPError, URLError
from urllib import urlencode
except ImportError:
from urllib.request import Request, urlopen
from urllib.parse import urlencode
from urllib.error import HTTPError, URLError
import re
try:
import simplejson as json
except ImportError:
import json
import cStringIO as IOStream
import gzip
class TextRazorAnalysisException(BaseException):
pass
class Topic(object):
"""Represents a single abstract topic extracted from the input text.
Requires the "topics" extractor to be added to the TextRazor request.
"""
def __init__(self, topic_json, link_index):
self._topic_json = topic_json
for callback, arg in link_index.get(("topic", self.id), []):
callback(arg, self)
@property
def id(self):
"""The unique id of this annotation within its annotation set. """
return self._topic_json.get("id", None)
@property
def label(self):
"""Returns the label for this topic."""
return self._topic_json.get("label", "")
@property
def wikipedia_link(self):
"""Returns a link to Wikipedia for this topic, or None if this topic
couldn't be linked to a wikipedia page."""
return self._topic_json.get("wikiLink", None)
@property
def score(self):
"""Returns the relevancy score of this topic to the query document."""
return self._topic_json.get("score", 0)
def __repr__(self):
return "TextRazor Topic %s with label %s" % (str(self.id),
str(self.label))
def __str__(self):
out = ["TextRazor Topic %s and label %s:" % (str(self.id),
str(self.label)), "\n"]
for property in dir(self):
if not property.startswith("_") and not property == "id":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Entity(object):
"""Represents a single "Named Entity" extracted from the input text.
Requires the "entities" extractor to be added to the TextRazor request.
"""
def __init__(self, entity_json, link_index):
self._response_entity = entity_json
self._matched_words = []
for callback, arg in link_index.get(("entity", self.document_id), []):
callback(arg, self)
for position in self.matched_positions:
try:
link_index[("word", position)].append((self._register_link,
None))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, None)]
def _register_link(self, dummy, word):
self._matched_words.append(word)
word._add_entity(self)
@property
def document_id(self):
return self._response_entity.get("id", None)
@property
def id(self):
"""Returns the disambiguated ID for this entity, or None if this entity
could not be disambiguated. """
return self._response_entity.get("entityId", None)
@property
def freebase_id(self):
"""Returns the disambiguated Freebase ID for this entity, or None if
either
this entity could not be disambiguated, or a Freebase link doesn't
exist."""
return self._response_entity.get("freebaseId", None)
@property
def wikipedia_link(self):
"""Returns a link to Wikipedia for this entity, or None if either this
entity
could not be disambiguated or a Wikipedia link doesn't exist."""
return self._response_entity.get("wikiLink", None)
@property
def matched_text(self):
"""Returns the source text string that matched this entity."""
return self._response_entity.get("matchedText", None)
@property
def starting_position(self):
return self._response_entity.get("startingPos", None)
@property
def ending_position(self):
return self._response_entity.get("endingPos", None)
@property
def matched_positions(self):
"""Returns a list of the token positions in the current sentence that
make up this entity."""
return self._response_entity.get("matchingTokens", [])
@property
def matched_words(self):
"""Returns a list of :class:`Word` that make up this entity."""
return self._matched_words
@property
def freebase_types(self):
"""Returns a list of Freebase types for this entity, or an empty list
if there are none."""
return self._response_entity.get("freebaseTypes", [])
@property
def relevance_score(self):
"""Returns the relevance this entity has to the source text. This is a
float on a scale of 0 to 1,
with 1 being the most relevant. Relevance is determined by the
contextual similarity between the entities
context and facts in the TextRazor knowledgebase."""
return self._response_entity.get("relevanceScore", None)
@property
def confidence_score(self):
"""Returns the confidence that TextRazor is correct that this is a
valid entity. TextRazor uses an ever increasing
number of signals to help spot valid entities, all of which contribute
to this score. These include the contextual
agreement between the words in the source text and our knowledgebase,
agreement between other entities in the text,
agreement between the expected entity type and context, prior
probabilities of having seen this entity across wikipedia
and other web datasets. The score ranges from 0.5 to 10, with 10
representing the highest confidence that this is
a valid entity."""
return self._response_entity.get("confidenceScore", None)
@property
def dbpedia_types(self):
"""Returns a list of dbpedia types for this entity, or an empty list if
there are none."""
return self._response_entity.get("type", [])
@property
def data(self):
""" Returns a dictionary containing enriched data found for this
entity. """
return self._response_entity.get("data", {})
def __repr__(self):
return "TextRazor Entity %s at positions %s" %
(self.id.encode("utf-8"), str(self.matched_positions))
def __str__(self):
out = ["TextRazor Entity with id:", self.id.encode("utf-8"), "\n"]
for property in dir(self):
if not property.startswith("_") and not property == "id":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Entailment(object):
"""Represents a single "entailment" derived from the source text.
Requires the "entailments" extractor to be added to the TextRazor request.
"""
def __init__(self, entailment_json, link_index):
self.entailment_json = entailment_json
self._matched_words = []
for callback, arg in link_index.get(("entailment", self.id), []):
callback(arg, self)
for position in self.matched_positions:
try:
link_index[("word", position)].append((self._register_link,
None))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, None)]
def _register_link(self, dummy, word):
self._matched_words.append(word)
word._add_entailment(self)
@property
def matched_positions(self):
"""Returns the token positions in the current sentence that generated
this entailment."""
return self.entailment_json.get("wordPositions", [])
@property
def matched_words(self):
"""Returns links the :class:`Word` in the current sentence that
generated this entailment."""
return self._matched_words
@property
def id(self):
"""The unique id of this annotation within its annotation set. """
return self.entailment_json.get("id", None)
@property
def prior_score(self):
"""Returns the score of this entailment independent of the context it
is used in this sentence."""
return self.entailment_json.get("priorScore", None)
@property
def context_score(self):
"""Returns the score of agreement between the source word's usage in
this sentence and the entailed words
usage in our knowledgebase."""
return self.entailment_json.get("contextScore", None)
@property
def score(self):
"""Returns the overall confidence that TextRazor is correct that this
is a valid entailment, a combination
of the prior and context score."""
return self.entailment_json.get("score", None)
@property
def entailed_word(self):
"""Returns the word string that is entailed by the source words."""
entailed_tree = self.entailment_json.get("entailedTree", None)
if entailed_tree:
return entailed_tree.get("word", None)
def __repr__(self):
return "TextRazor Entailment:\"%s\" at positions %s" %
(str(self.entailed_word), str(self.matched_positions))
def __str__(self):
out = ["TextRazor Entailment:", str(self.entailed_word), "\n"]
for property in dir(self):
if not property.startswith("_") and not property == "id":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class RelationParam(object):
"""Represents a Param to a specific :class:`Relation`.
Requires the "relations" extractor to be added to the TextRazor request."""
def __init__(self, param_json, relation_parent, link_index):
self._param_json = param_json
self._relation_parent = relation_parent
self._param_words = []
for position in self.param_positions:
try:
link_index[("word", position)].append((self._register_link,
None))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, None)]
def _register_link(self, dummy, word):
self._param_words.append(word)
word._add_relation_param(self)
@property
def relation_parent(self):
"""Returns the :class:`Relation` that owns this param."""
return self._relation_parent
@property
def relation(self):
"""Returns the relation of this param to the predicate:
Possible values: SUBJECT, OBJECT, OTHER"""
return self._param_json.get("relation", None)
@property
def param_positions(self):
"""Returns a list of the positions of the words in this param within
their sentence."""
return self._param_json.get("wordPositions", [])
@property
def param_words(self):
"""Returns a list of all the :class:`Word` that make up this param."""
return self._param_words
def entities(self):
"""Returns a generator of all :class:`Entity` mentioned in this
param."""
seen = set()
for word in self.param_words:
for entity in word.entities:
if entity not in seen:
seen.add(entity)
yield entity
def __repr__(self):
return "TextRazor RelationParam:\"%s\" at positions %s" %
(str(self.relation), str(self.param_words))
def __str__(self):
return repr(self)
class NounPhrase(object):
"""Represents a multi-word phrase extracted from a sentence.
Requires the "relations" extractor to be added to the TextRazor request."""
def __init__(self, noun_phrase_json, link_index):
self._noun_phrase_json = noun_phrase_json
self._words = []
for callback, arg in link_index.get(("nounPhrase", self.id), []):
callback(arg, self)
for position in self.word_positions:
try:
link_index[("word", position)].append((self._register_link,
None))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, None)]
def _register_link(self, dummy, word):
self._words.append(word)
word._add_noun_phrase(self)
@property
def id(self):
"""The unique id of this annotation within its annotation set. """
return self._noun_phrase_json.get("id", None)
@property
def word_positions(self):
"""Returns a list of the positions of the words in this phrase."""
return self._noun_phrase_json.get("wordPositions", [])
@property
def words(self):
"""Returns a list of :class:`Word` that make up this phrase."""
return self._words
def __repr__(self):
return "TextRazor NounPhrase at positions %s" % (str(self.words))
def __str__(self):
out = ["TextRazor NounPhrase:", str(self.word_positions), "\n"]
for property in dir(self):
if not property.startswith("_") and not property ==
"word_positions":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Property(object):
"""Represents a property relation extracted from raw text. A property
implies an "is-a" or "has-a" relationship
between the predicate (or focus) and its property.
Requires the "relations" extractor to be added to the TextRazor request.
"""
def __init__(self, property_json, link_index):
self._property_json = property_json
self._predicate_words = []
self._property_words = []
for callback, arg in link_index.get(("property", self.id), []):
callback(arg, self)
for position in self.predicate_positions:
try:
link_index[("word", position)].append((self._register_link,
True))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, True)]
for position in self.property_positions:
try:
link_index[("word", position)].append((self._register_link,
False))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, False)]
def _register_link(self, is_predicate, word):
if is_predicate:
self._predicate_words.append(word)
word._add_property_predicate(self)
else:
self._property_words.append(word)
word._add_property_properties(self)
@property
def id(self):
"""The unique id of this annotation within its annotation set. """
return self._property_json.get("id", None)
@property
def predicate_positions(self):
"""Returns a list of the positions of the words in the predicate (or
focus) of this property."""
return self._property_json.get("wordPositions", [])
@property
def predicate_words(self):
"""Returns a list of TextRazor words that make up the predicate (or
focus) of this property."""
return self._predicate_words
@property
def property_positions(self):
"""Returns a list of word positions that make up the modifier of the
predicate of this property."""
return self._property_json.get("propertyPositions", [])
@property
def property_words(self):
"""Returns a list of :class:`Word` that make up the property that
targets the focus words."""
return self._property_words
def __repr__(self):
return "TextRazor Property at positions %s" %
(str(self.predicate_positions))
def __str__(self):
out = ["TextRazor Property:", str(self.predicate_positions), "\n"]
for property in dir(self):
if not property.startswith("_") and not property ==
"predicate_positions":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Relation(object):
"""Represents a grammatical relation between words. Typically owns a
number of
:class:`RelationParam`, representing the SUBJECT and OBJECT of the relation.
Requires the "relations" extractor to be added to the TextRazor request."""
def __init__(self, relation_json, link_index):
self._relation_json = relation_json
self._params = [RelationParam(param, self, link_index) for param in
relation_json["params"]]
self._predicate_words = []
for callback, arg in link_index.get(("relation", self.id), []):
callback(arg, self)
for position in self.predicate_positions:
try:
link_index[("word", position)].append((self._register_link,
None))
except KeyError as ex:
link_index[("word", position)] = [(self._register_link, None)]
def _register_link(self, dummy, word):
self._predicate_words.append(word)
word._add_relation(self)
@property
def id(self):
"""The unique id of this annotation within its annotation set. """
return self._relation_json.get("id", None)
@property
def predicate_positions(self):
"""Returns a list of the positions of the predicate words in this
relation within their sentence."""
return self._relation_json.get("wordPositions", [])
@property
def predicate_words(self):
"""Returns a list of the TextRazor words in this relation."""
return self._predicate_words
@property
def params(self):
"""Returns a list of the TextRazor params of this relation."""
return self._params
def __repr__(self):
return "TextRazor Relation at positions %s" %
(str(self.predicate_words))
def __str__(self):
out = ["TextRazor Relation:", str(self.predicate_words), "\n"]
for property in dir(self):
if not property.startswith("_") and not property ==
"predicate_positions":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Word(object):
"""Represents a single Word (token) extracted by TextRazor.
Requires the "words" extractor to be added to the TextRazor request."""
def __init__(self, response_word, link_index):
self._response_word = response_word
self._parent = None
self._children = []
self._entities = []
self._entailments = []
self._relations = []
self._relation_params = []
self._property_predicates = []
self._property_properties = []
self._noun_phrases = []
for callback, arg in link_index.get(("word", self.position), []):
callback(arg, self)
def _add_child(self, child):
self._children.append(child)
def _set_parent(self, parent):
self._parent = parent
parent._add_child(self)
def _add_entity(self, entity):
self._entities.append(entity)
def _add_entailment(self, entailment):
self._entailments.append(entailment)
def _add_relation(self, relation):
self._relations.append(relation)
def _add_relation_param(self, relation_param):
self._relation_params.append(relation_param)
def _add_property_predicate(self, property):
self._property_predicates.append(property)
def _add_property_properties(self, property):
self._property_properties.append(property)
def _add_noun_phrase(self, noun_phrase):
self._noun_phrases.append(noun_phrase)
@property
def parent_position(self):
"""Returns the position of the grammatical parent of this word, or None
if this word is either at the root
of the sentence or the "dependency-trees" extractor was not
requested."""
return self._response_word.get("parentPosition", None)
@property
def parent(self):
"""Returns a link to the TextRazor word that is parent of this word, or
None if this word is either at the root
of the sentence or the "dependency-trees" extractor was not
requested."""
return self._parent
@property
def relation_to_parent(self):
"""Returns the Grammatical relation between this word and it's parent,
or None if this word is either at the root
of the sentence or the "dependency-trees" extractor was not requested.
TextRazor parses into the Stanford uncollapsed dependencies, as
detailed at:
http://nlp.stanford.edu/software/dependencies_manual.pdf
"""
return self._response_word.get("relationToParent", None)
@property
def children(self):
"""Returns a list of TextRazor words that make up the children of this
word. Returns an empty list
for leaf words, or if the "dependency-trees" extractor was not
requested."""
return self._children
@property
def position(self):
"""Returns the position of this word in its sentence."""
return self._response_word.get("position", None)
@property
def stem(self):
"""Returns the stem of this word"""
return self._response_word.get("stem", None)
@property
def lemma(self):
"""Returns the morphological root of this word, see
http://en.wikipedia.org/wiki/Lemma_(morphology)
for details."""
return self._response_word.get("lemma", None)
@property
def token(self):
"""Returns the raw token string that matched this word in the source
text."""
return self._response_word.get("token", None)
@property
def part_of_speech(self):
"""Returns the Part of Speech that applies to this word. We use the
Penn treebank tagset,
as detailed here:
http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"""
return self._response_word.get("partOfSpeech", None)
@property
def input_start_offset(self):
"""Returns the start offset in the input text for this token. Note
that this offset applies to the
original Unicode string passed in to the api, TextRazor treats multi
byte utf8 charaters as a single position."""
return self._response_word.get("startingPos", None)
@property
def input_end_offset(self):
"""Returns the end offset in the input text for this token. Note that
this offset applies to the
original Unicode string passed in to the api, TextRazor treats multi
byte utf8 charaters as a single position."""
return self._response_word.get("endingPos", None)
@property
def entailments(self):
"""Returns a list of :class:`Entailment` that this word entails."""
return self._entailments
@property
def entities(self):
"""Returns a list of :class:`Entity` that this word is a part of."""
return self._entities
@property
def relations(self):
"""Returns a list of :class:`Relation` that this word is a predicate
of."""
return self._relations
@property
def relation_params(self):
"""Returns a list of :class:`RelationParam` that this word is a member
of."""
return self._relation_params
@property
def property_properties(self):
"""Returns a list of :class:`Property` that this word is a property
member of."""
return self._property_properties
@property
def property_predicates(self):
"""Returns a list of :class:`Property` that this word is a predicate
(or focus) member of."""
return self._property_predicates
@property
def noun_phrases(self):
"""Returns a list of :class:`NounPhrase` that this word is a member
of."""
return self._noun_phrases
@property
def senses(self):
"""Returns a list of (sense, score) tuples representing scores of each
Wordnet sense this this word may be a part of."""
return self._response_word.get("senses", [])
def __repr__(self):
return "TextRazor Word:\"%s\" at position %s" %
((self.token).encode("utf-8"), str(self.position))
def __str__(self):
out = ["TextRazor Word:", str(self.token.encode("utf-8")), "\n"]
for property in dir(self):
if not property.startswith("_") and not property == "token":
out.extend([property, ":", repr(getattr(self, property)), "\n"])
return " ".join(out)
class Sentence(object):
"""Represents a single sentence extracted by TextRazor."""
def __init__(self, sentence_json, link_index):
if "words" in sentence_json:
self._words = [Word(word_json, link_index) for word_json in
sentence_json["words"]]
else:
self._words = []
self._add_links(link_index)
def _add_links(self, link_index):
if not self._words:
return
self._root_word = None
# Add links between the parent/children of the dependency tree in this
sentence.
word_positions = {}
for word in self._words:
word_positions[word.position] = word
for word in self._words:
parent_position = word.parent_position
if None != parent_position and parent_position >= 0:
word._set_parent(word_positions[parent_position])
else:
# Punctuation does not get attached to any parent, any non
punctuation part of speech
# must be the root word.
if word.part_of_speech not in ("$", "``", "''", "(", ")", ",",
"--", ".", ":"):
self._root_word = word
@property
def root_word(self):
"""Returns the root word of this sentence if "dependency-trees"
extractor was requested."""
return self._root_word
@property
def words(self):
"""Returns a list of all the :class:`Word` in this sentence."""
return self._words
class CustomAnnotation(object):
def __init__(self, annotation_json, link_index):
self._annotation_json = annotation_json
for key_value in annotation_json.get("contents", []):
for link in key_value.get("links", []):
try:
link_index[(link["annotationName"],
link["linkedId"])].append((self._register_link, link))
except Exception as ex:
link_index[(link["annotationName"], link["linkedId"])] =
[(self._register_link, link)]
def _register_link(self, link, annotation):
link["linked"] = annotation
new_custom_annotation_list = []
try:
new_custom_annotation_list = getattr(annotation, self.name());
except Exception as ex:
pass
new_custom_annotation_list.append(self)
setattr(annotation, self.name(), new_custom_annotation_list)
def name(self):
return self._annotation_json["name"]
def __getattr__(self, attr):
exists = False
for key_value in self._annotation_json["contents"]:
if "key" in key_value and key_value["key"] == attr:
exists = True
for link in key_value.get("links", []):
try:
yield link["linked"]
except Exception as ex:
yield link
for int_value in key_value.get("intValue", []):
yield int_value
for float_value in key_value.get("floatValue", []):
yield float_value
for str_value in key_value.get("stringValue", []):
yield str_value
for bytes_value in key_value.get("bytesValue", []):
yield bytes_value
if not exists:
raise AttributeError("%r annotation has no attribute %r" %
(self.name(), attr))
def __repr__(self):
return "TextRazor CustomAnnotation:\"%s\"" %
(self._annotation_json["name"])
def __str__(self):
out = ["TextRazor CustomAnnotation:",
str(self._annotation_json["name"]), "\n"]
for key_value in self._annotation_json["contents"]:
try:
out.append("Param %s:" % key_value["key"])
except Exception as ex:
out.append("Param (unlabelled):")
out.append("\n")
for link in self.__getattr__(key_value["key"]):
out.append(repr(link))
out.append("\n")
return " ".join(out)
class TextRazorResponse(object):
"""Represents a processed response from TextRazor."""
def __init__(self, response_json):
self.response_json = response_json
self.sentences = []
self.custom_annotations = []
link_index = {}
if "response" in self.response_json:
# There's a bit of magic here. Each annotation registers a
callback with the ids and types of annotation
# that it is linked to. When the linked annotation is later parsed
it adds the link via the callback.
# This means that annotations must be added in order of the
dependency between them.
if "customAnnotations" in self.response_json["response"]:
self.custom_annotations = [CustomAnnotation(json, link_index)
for json in self.response_json["response"]["customAnnotations"]]
if "topics" in self.response_json["response"]:
self._topics = [Topic(topic_json, link_index) for topic_json in
self.response_json["response"]["topics"]]
if "coarseTopics" in self.response_json["response"]:
self._coarse_topics = [Topic(topic_json, link_index) for
topic_json in self.response_json["response"]["coarseTopics"]]
if "entities" in self.response_json["response"]:
self._entities = [Entity(entity_json, link_index) for
entity_json in self.response_json["response"]["entities"]]
else:
self._entities = []
if "entailments" in self.response_json["response"]:
self._entailments = [Entailment(entailment_json, link_index)
for entailment_json in self.response_json["response"]["entailments"]]
else:
self._entailments = []
if "relations" in self.response_json["response"]:
self._relations = [Relation(relation_json, link_index) for
relation_json in self.response_json["response"]["relations"]]
else:
self._relations = []
if "properties" in self.response_json["response"]:
self._properties = [Property(property_json, link_index) for
property_json in self.response_json["response"]["properties"]]
else:
self._properties = []
if "nounPhrases" in self.response_json["response"]:
self._noun_phrases = [NounPhrase(phrase_json, link_index) for
phrase_json in self.response_json["response"]["nounPhrases"]]
else:
self._noun_phrases = []
if "sentences" in self.response_json["response"]:
self.sentences = [Sentence(sentence_json, link_index) for
sentence_json in self.response_json["response"]["sentences"]]
@property
def cleaned_text(self):
return self.response_json["response"].get("cleanedText", "")
def summary(self):
return """Request processed in: %s seconds. Num Sentences:%s""" % \
(self.response_json["time"],
len(self.response_json["response"]["sentences"]))
def custom_annotation_output(self):
"""Returns any output generated while running the embedded prolog
engine on your rules."""
return self.response_json["response"].get("customAnnotationOutput", "")
def coarse_topics(self):
"""Returns a list of all the coarse :class:`Topic` in the response. """
return self._coarse_topics
def topics(self):
"""Returns a list of all the :class:`Topic` in the response. """
return self._topics
def entities(self):
"""Returns a list of all the :class:`Entity` across all sentences in
the response."""
return self._entities
def words(self):
"""Returns a generator of all :class:`Word` across all sentences in the
response."""
for sentence in self.sentences:
for word in sentence.words:
yield word
def entailments(self):
"""Returns a list of all :class:`Entailment` across all sentences in
the response."""
return self._entailments
def relations(self):
"""Returns a list of all :class:`Relation` across all sentences in the
response."""
return self._relations
def properties(self):
"""Returns a list of all :class:`Property` across all sentences in the
response."""
return self._properties
def noun_phrases(self):
"""Returns a list of all the :class:`NounPhrase` across all sentences
in the response."""
return self._noun_phrases
def sentences(self):
"""Returns a list of all :class:`Sentence` in the response."""
return self.sentences
def matching_rules(self):
return [custom_annotation.name() for custom_annotation in
self.custom_annotations]
def __getattr__(self, attr):
exists = False
for custom_annotation in self.custom_annotations:
if custom_annotation.name() == attr:
exists = True
yield custom_annotation
if not exists:
raise AttributeError("TextRazor response has no annotation %r" %
attr)
class TextRazor(object):
"""
The main TextRazor client. To process your text, create a
:class:`TextRazor` instance with your API key
and set the extractors you need to process the text. Calls to
:meth:`analyze` and :meth:`analyze_url` will then process raw text or URLs
, returning a :class:`TextRazorResponse` on success.
This class is threadsafe once initialized with the request options. You
should create a new instance for each request
if you are likely to be changing the request options in a multithreaded
environment.
Below is an entity extraction example from the tutorial, you can find more
examples at http://www.textrazor.com/tutorials.
>>> client = TextRazor(api_key="DEMO", extractors=["entities"])
>>> client.set_do_cleanup_HTML(True)
>>>
>>> response =
client.analyze_url("http://www.bbc.co.uk/news/uk-politics-18640916")
>>>
>>> entities = list(response.entities())
>>> entities.sort(key=lambda x: x.relevance_score, reverse=True)
>>>
>>> seen = set()
>>> for entity in entities:
>>> if entity.id not in seen:
>>> print entity.id, entity.relevance_score,
entity.confidence_score, entity.freebase_types
>>> seen.add(entity.id)
"""
_SECURE_TEXTRAZOR_ENDPOINT = "https://api.textrazor.com/"
_TEXTRAZOR_ENDPOINT = "http://api.textrazor.com/"
def __init__(self, api_key, extractors, do_compression=True,
do_encryption=False):
self.api_key = api_key
self.extractors = extractors
self.do_compression = do_compression
self.do_encryption = do_encryption
self.cleanup_html = False
self.rules = ""
self.language_override = None
self.enrichment_queries = []
self.dbpedia_type_filters = []
self.freebase_type_filters = []
self.allow_overlap = None
def set_api_key(self, api_key):
"""Sets the TextRazor API key, required for all requests."""
self.api_key = api_key
def set_extractors(self, extractors):
"""Sets a list of "Extractors" which extract various information from
your text.
Only select the extractors that are explicitly required by your
application for optimal performance.
Any extractor that doesn't match one of the predefined list below will
be assumed to be a custom Prolog extractor.
Valid options are: words, phrases, entities, dependency-trees,
relations, entailments. """
self.extractors = extractors
def set_rules(self, rules):
"""Sets a string containing Prolog logic. All rules matching an
extractor name listed in the request will be evaluated
and all matching param combinations linked in the response. """
self.rules = rules
def set_do_compression(self, do_compression):
"""When True, request gzipped responses from TextRazor. When expecting
a large response this can
significantly reduce bandwidth. Defaults to True."""
self.do_compression = do_compression
def set_do_encryption(self, do_encryption):
"""When True, all communication to TextRazor will be sent over SSL,
when handling sensitive
or private information this should be set to True. Defaults to
False."""
self.do_encryption = do_encryption
def set_enrichment_queries(self, enrichment_queries):
"""Set a list of "Enrichment Queries", used to enrich the entity
response with structured linked data.
The syntax for these queries is documented at
https://www.textrazor.com/enrichment """
self.enrichment_queries = enrichment_queries
def set_language_override(self, language_override):
self.language_override = language_override
def set_do_cleanup_HTML(self, cleanup_html):
"""When True, input text is treated as raw HTML and will be cleaned of
tags, comments, scripts,
and boilerplate content removed. When this option is enabled, the
cleaned_text property is returned
with the text content, providing access to the raw filtered text. When
enabled, position offsets returned
in individual words apply to the clean text, not the provided HTML."""
self.cleanup_html = cleanup_html
def set_entity_allow_overlap(self, allow_overlap):
"""When allow_overlap is True, entities in the response may overlap.
When False, the "best" entity
is found such that none overlap. Defaults to True. """
self.allow_overlap = allow_overlap
def set_entity_dbpedia_type_filters(self, filters):
"""Set a list of DBPedia types to filter entity extraction on. All
returned entities must
match at least one of these types."""
self.dbpedia_type_filters = filters
def set_entity_freebase_type_filters(self, filters):
"""Set a list of Freebase types to filter entity extraction on. All
returned entities must
match at least one of these types."""
self.freebase_type_filters = filters
def analyze_url(self, url, headers={}):
"""Given a url and optional dict of HTTP headers, first downloads the
URL then processes the
resulting text. If you expect HTML in the response, you may want to
set :meth:`set_do_cleanup_HTML`
to true to filter unwanted HTML content.
Returns a :class:`TextRazorResponse` with the parsed data on success.
Raises a :class:`TextRazorAnalysisException` on failure. """
req = Request(url, headers=headers)
response = urlopen(req)
text = response.read().decode("utf-8", "ignore")
return self.analyze(text)
def analyze(self, text):
"""Calls the TextRazor API with the provided unicode text.
Returns a :class:`TextRazorResponse` with the parsed data on success.
Raises a :class:`TextRazorAnalysisException` on failure. """
post_data = [("text", text.encode("utf-8")),
("apiKey", self.api_key),
("rules", self.rules),
("extractors", ",".join(self.extractors)),
("cleanupHTML", self.cleanup_html)]
for filter in self.dbpedia_type_filters:
post_data.append(("entities.filterDbpediaTypes", filter))
for filter in self.freebase_type_filters:
post_data.append(("entities.filterFreebaseTypes", filter))
for query in self.enrichment_queries:
post_data.append(("entities.enrichmentQueries", query))
if self.language_override != None:
post_data.append(("languageOverride", self.language_override))
if self.allow_overlap != None:
post_data.append(("entities.allowOverlap", self.allow_overlap))
encoded_post_data = urlencode(post_data)
request_headers = {}
if self.do_compression:
request_headers['Accept-encoding'] = 'gzip'
if self.do_encryption:
request = Request(self._SECURE_TEXTRAZOR_ENDPOINT,
headers=request_headers, data=encoded_post_data.encode("utf-8"))
else:
request = Request(self._TEXTRAZOR_ENDPOINT,
headers=request_headers, data=encoded_post_data.encode("utf-8"))
try:
response = urlopen(request)
except HTTPError as e:
raise TextRazorAnalysisException("TextRazor returned HTTP Code %d:
%s" % (e.code, e.read()))
except URLError as e:
raise TextRazorAnalysisException("Could not connect to TextRazor")
if response.info().get('Content-Encoding') == 'gzip':
buf = IOStream.StringIO( response.read())
response = gzip.GzipFile(fileobj=buf)
response_json = json.loads(response.read().decode("utf-8"))
return TextRazorResponse(response_json)
_______________________________________________
Ironpython-users mailing list
[email protected]
https://mail.python.org/mailman/listinfo/ironpython-users