Re: [mylyn-integrators] [WikiText] Is it possible to extract attributes using WikiText

David Green Wed, 15 Jul 2009 08:45:00 -0700

Luven,
The good news: The WikiText parser is used behind the scenes in the WikiText
editor -- so yes, it can give you the offsets that you need to know where
things are in the source markup.  The bad news: this functionality was
retrofitted into the WikiText parser and is probably not the most intuitive
API.


You're pointed in the right direction.
 org.eclipse.mylyn.wikitext.core.parser.Locator is indeed the place to get
the offsets that you need.  The WikiText markup editor uses a
DocumentBuilder to build a model of the document with exact offsets.  A good
place to start looking
is 
org.eclipse.mylyn.internal.wikitext.ui.editor.syntax.FastMarkupPartitioner.PartitionBuilder.

If you feel that there are bugs in the implementation (including
documentation bugs or lacking documentation) please post a bug at
bugs.eclipse.org under Tools/Mylyn/WikiText.  If you're able to attach JUnit
tests that exercise/demonstrate the bug, that's even better.

To satisfy my curiosity, perhaps you could tell me more about your use case:
Why do you want to know these offsets?

Regards,

David

On Wed, Jul 15, 2009 at 8:24 AM, siluven <[email protected]> wrote:

>  Hello,
>
> now I like to get the begin and end offset of each links in wikitext. Later
> maybe other components (images).
> Is that possible to do this using Mylyn WikiText?
>
> I tried to use getLocator().getDocumentOffset() in method:
>
> void
> org.eclipse.mylyn.wikitext.core.parser.builder.NoOpDocumentBuilder.link(Attributes
> attributes, String hrefOrHashName, String text) and
> void
> org.eclipse.mylyn.wikitext.core.parser.builder.NoOpDocumentBuilder.characters(String
> text)
>
>
> but I found out that document offset is not updated after calling method 
> link(Attributes
> attributes, String hrefOrHashName, String text).
> The offset is also sometimes decremented. So far I now, it should be only
> incremented.
> I test by using this simple text below as my MediaWiki WikiText:
>
> The '''EditorX''' is an [[text editor|editor]] of small to medium-sized
> [[text]].
> This is a [[test]] too ('''yes''').
>
> *The Result**:*
>
>   *Offset
> * *Component type
> * *Value
> * *Comment*
>   -1 DOCUMENT_BEGIN
>  []
>   0 BLOCK_BEGIN
>  [PARAGRAPH]
>   0 CHARACTERS_GROUP [*The *]
>   4 SPAN_BEGIN [BOLD]
>   7 CHARACTERS_GROUP [*EditorX*]
>   7 SPAN_END [BOLD]
>   17 CHARACTERS_GROUP [* is an *]
>   *7*
>  LINK [*editor*] The offset is anyhow decremented (¿*Bug*?)
>   7
>  CHARACTERS_GROUP [* of small to medium-sized *] Now is all offset
> incorrect
>   55 LINK
>  [*text*]
>   55 CHARACTERS_GROUP [*.*]
>   *83* CHARACTERS_GROUP [] New line position is correct now.
>   83 CHARACTERS_GROUP [*This is a *]
>   93 LINK [*test*]
>   *93* CHARACTERS_GROUP [* too (*] 93 is offset of the link
>   *107* SPAN_BEGIN [BOLD] Here is correct again
>   110 CHARACTERS_GROUP
>  [*yes*]
>   110 SPAN_END [BOLD]
>   116 CHARACTERS_GROUP [*).*]
>   118 BLOCK_END [PARAGRAPH]
>   118 DOCUMENT_END []
>
> For heading I am currently using:
>
> getLocator().getLineDocumentOffset() for beginOffset and
> getLocator().getLineDocumentOffset()+getLocator().getLineLength() for
> endOffset
>
> and it works so far.
> It does not work for link because there can be some links in the same line.
>
> Best regards,
>
> Luven
>
> On 6/25/2009 6:59 PM, siluven wrote:
>
> Thank you David,
> that is the functionality I need. I've tried also with headings and
> images.
> And it works too.
>
> Best regards,
> Luven
>
> On 6/24/2009 6:47 PM, David Green wrote:
>
> You want to do something like this:
>
>
>  public class ExtractHyperlinksBuilder extends NoOpDocumentBuilder {
>
> private Set<String> hyperlinks = new HashSet<String>();
>
>  @Override
>
> public void link(Attributes attributes, String hrefOrHashName, String
> text) {
>
> hyperlinks.add(hrefOrHashName);
>
> }
>
>  @Override
>
> public void imageLink(Attributes linkAttributes, Attributes
> imageAttributes, String href, String imageUrl) {
>
> hyperlinks.add(href);
>
> }
>
>  public Set<String> getHyperlinks() {
>
> return hyperlinks;
>
> }
>
>  }
>
>
>
>
>   MarkupParser parser = new
>  MarkupParser(ServiceLocator.getInstance().getMarkupLanguage("MediaWiki"
> ));
>
> ExtractHyperlinksBuilder builder = new ExtractHyperlinksBuilder();
>
> parser.setBuilder(builder);
>
>  Reader markupContent = null;// open reader
>
> try {
>
> parser.parse(markupContent);
>
> } finally {
>
> markupContent.close();
>
> }
>
> // do something with builder.getHyperlinks()
>
>  Regards,
>
>  David
>
> On Wed, Jun 24, 2009 at 4:13 AM, siluven <[email protected]> wrote:
>
>> Hello everyone,
>> I am a new Mylyn user. I'm planning to work with Wiki Articles with java
>> and eclipse.
>>
>> Is that possible to extract attributes like hyperlinks, headings, images,
>> etc. directly from e. g. wikimedia markup-language using WikiText.
>> like:
>> - obj.getHyperlinks();
>>
>> If it is possible or maybe there is solutions for this, how could it be
>> done?
>>
>> Thank you and best regards
>> Luven
>>
>> _______________________________________________
>> mylyn-integrators mailing list
>> [email protected]
>> https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
>>
>>
>  ------------------------------
>
> _______________________________________________
> mylyn-integrators mailing 
> [email protected]https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
>
>  ------------------------------
>
> _______________________________________________
> mylyn-integrators mailing 
> [email protected]https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
>
>
> _______________________________________________
> mylyn-integrators mailing list
> [email protected]
> https://dev.eclipse.org/mailman/listinfo/mylyn-integrators
>
>

_______________________________________________
mylyn-integrators mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/mylyn-integrators

Re: [mylyn-integrators] [WikiText] Is it possible to extract attributes using WikiText

Reply via email to